Gather-Scatter operators are index operations that produce output by indexing into a given data based on a given set of indices and other optional parameters. Gather operators fetch data at the indexed locations and provide this as the output, while scatter operators usually update the output data at the indexed locations. Gather and scatter operators are often inverse operations of each other. There are many gather and scatter operators defined in various deep learning frameworks and standards currently and are used for various applications, including object detection, semantic segmentation, etc. Despite their similarities, there are many functions as well as API level differences among these operators. Sometimes, the same operator has variations in different standards. Also, the differences in data layouts among frameworks play a role in these operators. Thus, it is often confusing for a deep learning programmer/ researcher to understand, implement and use these operators when required. We aim to go over some popular gather/ scatter operator definitions to develop an intuitive understanding of the APIs using visual aids and, thus, to distinguish between them. The following table lists the popular Gather/ Scatter operators in various deep learning standards.
GatherElements
ONNX v11 API: Tensor Output = GatherElements(
Tensor Data,
Tensor Index,
int axis = 0) This is an ONNX specific operator that gathers individual elements along the specified axis of a given tensor based on the index values. It forms an inverse operator pair with ScatterElements. The
axis input is optional and has a default value of 0 (outermost or the slowest changing dimension). Here, output tensor has the same shape as the index tensor and the output at each location is determined by the following way: out[i][j][k] = input[index[i][j][k]][j][k] if axis = 0, out[i][j][k] = input[i][index[i][j][k]][k] if axis = 1, out[i][j][k] = input[i][j][index[i][j][k]] if axis = 2 The data is gathered from the input at the same location as the output coordinate, but the axis coordinate is replaced with the corresponding index value. This is further illustrated with some examples below. The colors indicate the correspondence between the index value and the data value.
Example 1:
One exciting feature of this operator is that it accepts negative values for axis and index values. Axis has an accepted range of [-r, r-1], where r indicates the input tensor’s rank (or dimensionality), and the negative value simply means wrapping around the tensor dimensions. For example, axis = -1 indicates the innermost dimension, i.e., (r-1)th dim of the tensor. Similarly, the index value can range between [-s, s-1], where s is the size of the Data tensor along the axis, and the negative values simply wrap around the axis, as illustrated in the example below.
Example 2:
ScatterElements
ONNX v11 API:
Tensor Output = ScatterElements(
Tensor Data,
Tensor Indices,
Tensor Updates,
int axis = 0) As mentioned before, this operator is the inverse of the GatherElements operator. ScatterElements takes three inputs: data, indices, and updates. It also takes an optional input axis with default value of 0 (outermost or the slowest changing dimension). To obtain the output, firstly a copy of the data is created so that both output and input tensors have the same shape. Then values specified by
updates are updated at index positions specified by indices. There should be an index specified for each update value, and thus these tensors have the same shape. For instance, output[indices[i][j][k]][j][k] = updates[i][j][k] if axis = 0, output[i][indices[i][j][k]][k] = updates[i][j][k] if axis = 1, output[i][j][indices[i][j][k]] = updates[i][j][k] if axis = 2, Thus, the given updates are scattered to the given locations of the output tensor and the remaining locations contain the original input. Here, there is a one-to-one mapping between indices tensor and updates tensor and this is indicated with the colors in the example below.
Example:
GatherND
GatherND is a generalized version of the GatherElements operator described earlier. This operator gathers slices of data from the specified indices and stores to output. This operator is defined in various frameworks including ONNX, TensorFlow and MxNet.
ONNX v11 API:
Tensor Output = GatherND(
Tensor Data,
Tensor Indices) Unlike the previous GatherElements operator, each tensor’s rank and shape could be different, and is as given below.
- Data: Tensor of rank r >= 1
- Indices: Tensor of rank q >= 1
- Output: Tensor of rank, o = q + r - m - 1; m = indices_shape[-1]
The output tensor rank equation becomes clear when we visualize the operator itself. First, let’s note that the indices tensor is a multidimensional collection of index tuples and each index tuple is specified along the indices tensor’s innermost dimension. Thus, the rank of each tuple is given by m= indices_shape[-1]. The indices tensor can be thought of as a (q-1) dimensional tensor of m-dimensional indices. Depending on the input tensor’s rank, these index tuples may be referring to various slice shapes - elements, lines, sheets, cuboids or even tesseracts. For example, if there are as many elements in the tuple as dimensions in the data tensor, i.e., if data tensor rank r and index tuple rank m are equal, each tuple refers to an element in the data. To take an explicit example, an index tuple with two elements indexing into a 2D tensor always refers to elements. Similarly, a two-element tuple indexing into a 3D tensor refers to a line and so on. Thus, the slice shape can be identified based on input rank and index tuple rank as below:
- If r - m = 0 - Element is gathered at each index tuple
- If r - m = 1 - Lines are gathered at each index tuple
- If r - m = 2 - Sheets are gathered at each index tuple
- If r - m = 3 - Cuboids are gathered at each index tuple
Now, output tensor is a collection of such (r-m) dimensional slices and the total number of such slices is given by the total number of index tuples along the (q-1) dimensions (-1 because the index tuple occupies one dimension within the q-dimensional tensor). Thus, the overall output dimensions would be equal to, (r-m)+(q-1)=0, as given above. The input and output tensor shapes would be better understood with some examples given below:
Example 1:- data (3D): [[[1,2],[3,4]],[[5,6],[7,8]]]
- Indices (3D): [[[0,0]],[[1,0]]]
- output (3D) = [[[1,2]],[[5, 6]]]
r = 3, m = 2 => r - m = 1. So, the gathered slice shape is a line here.

Note that the index tuple is formed along the innermost dimension and that each tuple refers to the outermost dimension first and then moves inward. Like the GatherElements operator, the index tuples can be negative here also and are expected to be within bounds [-s, s-1] along the axis of size s. It is an error if any of the index values are out of bounds. A negative value indicates wrapping around and counting from backward, as before.
Example 2:- data (2D): [[1,2],[3,4]]
- Indices (2D): [[-2,0]],[[1,1]]
- output (1D) = [1, 4]
Here, r = 2, m = 2 => r - m = 0. So, elements are gathered

TensorFlow 2.1.0 API: tf.gather_nd(params, indices, batch_dims=0, name=None) TensorFlow gather_nd works the same way as ONNX gather_nd. Here,
params is similar to the
data tensor and the operator indexes into the first (r-m) dimensions of the params tensor of rank r, based on each m-dimensional index tuple in indices tensor. One crucial difference between TensorFlow 2.1.0 API and ONNX v11 API is the argument
batch_dims, which specifies the leading number of outermost dimensions of params (data) and indices tensor. Batch_dims = b means the gather operation starts from (b+1)th dim of the data, and thus the output dimensions equations should be adjusted as: o = q + r - m - b - 1, since we skip over the outermost b dimensions. (
Note: batch_dims is supported in ONNX v12 Operator Set onwards) There are other minor differences also, including that negative indices are not allowed. In case of out of bound indices on CPU, an error is returned, and on GPU, a 0 is stored in the corresponding output value.
MxNet v1.6 API: mxnet.ndarray.gather_nd(data=None, indices=None, out=None, name=None, **kwargs) MxNet version is similar but differs from ONNX and TensorFlow as the index coordinates are picked along the outermost dimension, also called the slowest changing dimension(SCD). If we consider example 2 above, index tensor is accessed along SCD to obtain coordinates. Hence, we get (0,1) and (0,1) as coordinates instead of (0,0) and (1,1).

So, output becomes [data(0,1), data(0,1)] that is [2,2]. Like TensorFlow, negative indices are not supported in Mxnet as well. One point of interest is that in all three frameworks, the first element of a tuple always refers to the outermost dimension and then moves inwards. The differences of GatherND operator across various frameworks are captured in Table 1.
ScatterND
This operator is a generalized version of Scatter Elements operator and it updates given slices of data into the output at specified indices. This operator is present in mxnet, ONNX and TensorFlow.
ONNX V11 API: Tensor output = scatter_nd(
Tensor Data,
Tensor Indices,
Tensor Updates) The output of scatter ND is obtained by creating a copy of
data and updating the values from
updates at output locations specified by
indices. Similar to gather_nd, the tensor dimensions are as below:
- Data: Tensor of rank r >= 1,
- Indices: Tensor of rank q >=1
- Updates: Tensor of rank q + r - m - 1; m = indices shape [-1]
- Output: Tensor of rank r, with the same shape as data tensor
Similar to GatherND, the slice shape is determined by (r-m) and can have shapes like Element, Line, Sheet, Cuboid, etc. as illustrated below:
Scatter_nd element (r-m = 0) 
Scatter_nd sheet (r - m = 2)
A numerical example is given below to illustrate the operation further.
Example:

The index order and data layout are the same as in Gather ND. Also, indices must not contain duplicate entries, that is having more than one update for the same location is not allowed.
TensorFlow 2.1.0 API: tf.scatter_nd(indices, updates, shape, name=None) In TensorFlow scatter. nd, the updates are scattered to output locations specified by indices without making a copy of data. Hence there is no data argument here. The locations that are not specified by indices are filled with zeroes. Also, unlike ONNX Scatter ND, indices are allowed to contain duplicates and if indices contain duplicates, then their updates are accumulated (summed), as illustrated below.
Example 1 (element): Updates (2x2x2)

Indices (2x2x2x2)

Output (3x2)

Example 2 (sheet): Updates (2x2x2x2)

Indices (1x2x2)

Output (2x2x4)

MxNet v1.6 API: scatter_nd(data, indices, shape) MxNet scatter. ND works similarly as TensorFlow scatter_nd but the coordinates are picked along outermost dimension (SCD) instead of the innermost dimension (FCD). Also, MxNet does not support duplicate indices. The output will be non-deterministic if the indices contain duplicate entries because updates will be overwritten. The following table compares GatherND and ScatterND across various frameworks.
| Gather ND | Scatter ND |
| ONNX | TF | MxNet | ONNX | TF | MxNet |
Negative indices | Supported | Not Supported | Not Supported | Supported | Not supported | Not supported |
Duplicate indices | Allowed | Allowed | Allowed | Error | Accumulate | Non-deterministic result |
Locations not covered by indices | NA | NA | NA | Same as data tensor | Zero | Zero |
Out of bound indices | Error | Error/Zero | Error | Error | Zero | Error |
Table 1: Comparison of Gather ND/ Scatter ND across frameworks
Gather/Take
The Gather operator gathers/takes entries along axis based on index locations. This operator is present in ONNX, Caffe2, TensorFlow and is very much similar to the Take operator in Numpy and mxnet.
ONNX v11 API: Tensor output = Gather (Tensor Data, Tensor Indices,
int axis =0) This operator can also gather various slices from the data, but unlike GatherND, the indices are applied along the specified axis only. Here too, the output is a collection of such slices and the output rank is given by o = r + q - 1, where r is the rank of input and q is the rank of indices. In general, the output is calculated as below: Let k = indices[i_{0}, ..., i_{q-1}]
- output[i_{0}, ..., i_{q-1}, j_{0}, ..., j_{r-2}] = input[k , j_{0}, ..., j_{r-2}] if axis = 0,
- output[i_{0}, ..., i_{q-1}, j_{0}, ..., j_{r-2}] = input[ j_{0}, k, ..., j_{r-2}] if axis = 1,
- output[i_{0}, ..., i_{q-1}, j_{0}, ..., j_{r-2}] = input[ j_{0}, j_{1}, k, ..., j_{r-2}] if axis = 2 and so on.
Here, the output can be thought of as a tensor of the same shape as the indices tensor, but with each index value replaced by the slice it refers to in the input. Thus the output shape can be specified as: output_shape = data_shape[:axis] + indices_shape[0:] + data_shape[axis + 1:] ,i.e., the outermost dimensions up to axis has the same shape as the data tensor, after that the same shape as the indices tensor and then again the data tensor shape from axis+1 to the innermost dimension. Let us understand it better with an example.

We first see that the output rank is given by o = r + q - 1 = 3. Here, the axis is 0, i.e., the outermost dimension. The output is calculated as output [0,0,0] = input [index [0,0],0] = input [0, 0] = 20 output [0,0,1] = input [index [0,0,],1] = input [0, 1] = 30, and so on. Each index value represents a full row in input, and as such, output is a 2x2 collection of rows of size 2. Here, the index values are expected to be in the range [-s, s-1], with s being the size of the data tensor along ‘axis’. Also, notably, the index tuples are applied along the specified axis and thus within one gather operation, elements along different axes cannot be gathered. In other words, this operator is less generic than GatherND but is still more versatile than GatherElements as it can gather different slices.
TensorFlow v2.1.0 API: tf.gather(params, indices, validate_indices=None, name=None, axis=None, batch_dims=0) The TensorFlow version is similar to the ONNX version but with certain significant differences.
- The axis input is a tensor and not a scalar parameter. Thus, it is possible to apply different axis values within a single Gather operation.
- Like the TensorFlow GatherND, this operator supports a parameter called batch_dims, which specifies the leading number of outermost dimensions (of data and indices). The output tensor shape in this case is given by: output_shape = params_shape[:axis] + indices_shape[batch_dims:] + params_shape[axis + 1:]
In Tensorflow, negative axis value is allowed, but negative index values are not. The validate_indices parameter is not in use and has been deprecated.
Caffe2 API:
Tensor output = Gather (
Tensor data,
Tensor indices) The caffe2 version is similar to the ONNX version, but unlike the ONNX version, it does not allow gathering along an arbitrary axis. Here the gather operation is always along axis = 0, i.e., the outermost dimension.
Numpy API: numpy.take(a, indices, axis=None, out=None, mode='raise')[source]¶ As mentioned before, the numpy take operator is functionally similar to the ONNX gather operator. It can be thought of as a superset of the ONNX gather, as it has additional features. The most notable feature is that it has mode input which allows controlling the behavior of the operator when indices are either negative or out of bound. The supported modes are:
- ‘raise’ – raise an error (default)
- ‘wrap’ – wrap around
- ‘clip’ – clip to the range
Mxnet v1.6 API: mxnet.ndarray.take (a=None, indices=None, axis=_Null, mode=_Null, out=None, name=None, **kwargs) The mxnet take operator is similar to the numpy take operator mentioned above and also supports various modes in case of out of bounds indices. The table below highlights the differences and similarities in Gather operator between different standards.
| ONNX Gather | TF Gather | Caffe2 Gather | Numpy Take | Mxnet Take |
If axis is not specified | Defaults to the outermost axis | Defaults to the first non-batch dimension | NA, axis is always outermost axis | Input is flattened | Defaults to the outermost axis |
Negative axis | Supported | Supported | NA | Supported | supported |
Negative indices | Supported (always wrap around) | Not Supported | Not Supported | Supported with different modes (raise, wrap, clip) | Supported with different modes (raise, wrap, clip) |
Out of bound indices | Error | Zero in GPU/Error in CPU | Error | Supported with different modes (raise, wrap, clip) | Supported with different modes (raise, wrap, clip) |
Table 2: Comparison of Gather/Take across frameworks
Conclusion
Gather-Scatter operators are used in deep learning applications for various indexing operations. The backpropagation along a gather layer is implemented using the corresponding scatter operator and vice-versa. We have described several of these Gather/Scatter operators in detail here. As can be seen, there are several similarities among these operators across various frameworks. For example, an index tuple of {a, b, c} means coordinate a is along the outermost dimension, b along the second outermost dimension, and c along the innermost dimension, for all the operators and frameworks. Similarly, axis 0 refers to the outermost dimension in all frameworks. However, there are considerable functional and API level differences among these operators and frameworks. We summarized the functionalities of these operators with visual examples and compared and contrasted them across various frameworks. We hope this helps other deep learning programmers and researchers to understand better and use these operators.
References
- https://github.com/onnx/onnx/blob/master/docs/Operators.md#GatherElements
- https://github.com/onnx/onnx/blob/master/docs/Operators.md#ScatterElements
- https://github.com/onnx/onnx/blob/master/docs/Operators.md#GatherND
- https://www.tensorflow.org/api_docs/python/tf/gather_nd
- http://beta.mxnet.io/r/api/mx.nd.gather.nd.html
Further Reading - Edge AI hardware: How chipmakers are redefining architectures and accelerating market adoption
- How AI-powered devices are remodeling the Consumer Electronics Industry
- Developing Deep Learning Algorithms on an Embedded Platform
- 10 Amazing AI-Based Video Analytics Use Cases in Retail