Let's say we have an ordered 1D tensor of ints/longs t.
I want to produce a new tensor a with size max(t) where each term a[i] contains the first occurrence of the value i in the tensor t.
We could easily compute the tensor a using standard Python code, but it would be too slow for large input as the one I'm using.
I'm looking for a fast solution that can run on GPU using the CUDA backend of PyTorch or simply a fast solution.