[Math] Complexity of inversion counting

algorithmscomputational complexitycomputer scienceinformation theory

.

Let $X \in \mathbb{Z}^n$. I need to find the number of pairs $ (i, j)$ with $ 0 < i < j \le n$ such that $ X_i>X_j$ as well as the number of pairs $ (i, j),\ 0 < i < j \le n$, such that $ X_i=X_j$.

I know how to calculate in $O(n \log(n))$ time complexity.
How to prove that a comparison-based algorithm with better complexity does not exist (check "update" section on the bottom)?


An algorithm with $O(n log(n))$ time complexity (one of the known methods):

We store:

1) 2 accumulators for the first and for the second value. (both of them initially 0)

2) Balanced search tree (AVL, for example) with the size of subtree at each vertex (operation "add element" is $O(\log \text{size})$ cost). Elements in this tree are 2-tuples $((-\infty\cup\mathbb{Z}\cup\infty), \mathbb{Z} )$, comparison is done by first element, if first elements are equal – by the second element. Initially this tree is empty.

Subprocedure – find the number of elements in the tree that lay in the interval $[X, Y]$ – with the time cost $O(\log n)$ (trivial)

Call this procedure as count$(X, Y)$

So, for $i$ from $1$ to $n$:

  1. add to first accumulator count( ($-\infty,1$),($X_i -1, i$))
  2. add to second accumulator count (($X_i -1, -1$), ($X_i -1, i-1$))
  3. add to balanced tree ($X_i, i$)

So, $n$ steps with $O (\log{n})$ cost

Update:
It cannot be calculated faster than $O(n)$ complexity:
The second count can be calculated with $O(n)$ time complexity (with the simple hash tables), so I need to prove the time complexity lower bound only for counting the pairs $ (i, j),\ 0 < i < j \le n$ such that $X_i>X_j$

Best Answer

I found a proof

Get the subcase of this task - set what all $X_i$ are distinct.

Let exist an algorithm F with time complexity O(f(n)), $n \le f(n) \le nlog(n)$

Now we create a new array Y of pairs $(X_i, i)$ and cast with algo to Y.
At any comparizon in F put into log pair (i, j), where $X_i<X_j$ and F was compare this values.

Where are O(f(n)) values in the log. See it as a graph G with n vertices, where pairs are directed edges. G contains no cycles

Lemma 1. The time complexity of topological sorting of graph G with n vertices and m edges is no more $O(n+m)$

Lemma 2. If graph G contains two different topological orders,then G contains two different topological orders, difference between them is permutation of two adjanced elements.

Both lemmas are proved by the trivial algorithm. http://codepad.org/fLDdQjpt - where is a simple java implementation

So, fix orders $Y_1$ and $Y_2$ Create by them two start arrays $X_1$ and $X_2$, difference beetween them is permutation of 2 elements => $X_1$ and $X_2$ have different inversions count, but F must works equally.

So, we can sort X with $O(n) + O(f(n)) + O(f(n) +n) = O(f(n))$ time complexity (third is a time complexity of the topological sorting of G)

So, $n*log(n) < f(n)$ asymptotically => $f(n) = O(n*log(n))$

Related Question