Why does sorting twice produce a rank vector

sorting

This question came up after I read this post .

There's a function numpy.argsort() that returns the indexes of the original array that would yield a sorted array. By applying this function twice you would get the rank of the original away.

Someone commented that "The first argsort returns a permutation (which if applied to the data would sort it). When argsort is applied to (this or any) permutation, it returns the inverse permutation (that if the 2 permutations are applied to each other in either order the result is the Identity). The second permutation if applied to a sorted data array would produce the unsorted data array, i.e. it is the rank."

But is there any mathematical way to explain it? I think there should be linear algebra formula to explain it.

Best Answer

Here is a cleaner way to explain it in simple mathematics. Let's denote the original array as $A$, the one after argsort() once as $A'$, and the final outcome after argsort() twice as $A''$.

Let's first be clear about the following statement: if $A'[i] = j$, then it means $A[j]$ is the i-th smallest element in $A$, by the definition of argsort(). Then similarly, if $A''[k] = q$, it means $A'[q]$ is the k-th smallest element in $A'$. Recall that, $A'$ actually contains the indices of $A$, therefore, the elements range from $1$ to $|A|$ (allow me to count from $1$ instead of $0$ for the sake of simplicity).

!BOOM!: That is to say the k-th smallest element in $A'$ exactly equals to $k$, i.e. $A'[q] = k$. Consequently, by the statement at the beginning, $A[k]$ is the q-th smallest element in $A$, which is saying the rank of $A[k]$ is $q$!

P.S., another more efficient way is to do it this way, to avoid sorting twice.