[Math] What algorithm can sort the first sqrt(n) elements of an array in O(n) time

algorithmsasymptotics

I want to partially sort an array of $n$ elements to get the first $\sqrt{n}$ elements sorted, and it has to be done in $O(n)$ time.

The complexity $O(n)$ seems to imply that it is necessary to go through the entire array, but when using a sorting algorithm in the first step of building a tree/heap by using heap sort, this and other $O(nlogn)$ algorithms are the fastest they can be when processing the entire $n$ elements.

Does this mean that I don't have to process all the numbers, and what kind of strategy can I use?

Best Answer

I'm not sure it can be done.

First of all, you have to look at all $n$ items.

Then, you have to keep the top $\sqrt{n}$ items. To find where an item that is in the top $\sqrt{n}$ places, you have to do $O(\log(\sqrt{n})) =O(\log(n)) $ comparisons. If the items are badly ordered, you will have to do this $n$ times.

This gives a worst case time of $O(n \log n)$.

If the items are will distributed, the placing in the top $\sqrt{n}$ might only be done $\sqrt{n}$ times. In this case, the placing would only take $O(\sqrt{n}\log(n))$ operations, so this would be $O(n)$.

Related Solutions

[Math] How to one grab a random node from a binary tree without flattening it

Do you know how many nodes are in each node's child subtrees? If you do, you can just decide that you want, say, the $k$-th node from the left and then descend the tree to find that node:

Let $n$ be the total number of nodes in the tree. Choose $k$ to be a random integer between $0$ and $n-1$ inclusive. Let $A$ initially be the root node of the tree.
Let $m$ be the number of nodes in the left subtree of $A$. (If $A$ is a leaf or has only right children, let $m = 0$.)
If $k = m$, choose $A$ as the node we want and stop.
Otherwise, if $k < m$, replace $A$ with its left child node and repeat from step 2.
Otherwise (i.e. if $k > m$), subtract $m+1$ from $k$, replace $A$ with its right child node and repeat from step 2.

This algorithm is much more efficient than traversing the entire tree; its running time is bounded by the depth of the tree, which for (approximately) balanced trees is proportional to the logarithm of the total number of nodes.

[Math] How to efficiently create balanced KD-Trees from a static set of points

If you want to make use of pre-sorted lists, the the idea is to use the fact that the median will be located at $i = floor(\frac{N}{2})$, where $N$ is the length of the list you're looking for.

The nice thing about splitting the list is that when we recurse, we can simply use the sub-lists from $[0, i)$ and $[i, N)$ to find the median again.

However, as you noticed, we need to be smart about splitting the two lists in the K-D tree. The general idea is if we pick a point $P = (x_0, y_0)$ to split with, we need to split both the $X$-sorted list with respect to $x_0$ and the $Y$ sorted list with respect to $x_0$. (We split both according to $x_0$, not one with $x_0$ and one with $y_0$)

In the given example,

0 = (4, 7)
1 = (2, 9)
2 = (5, 4)
3 = (3, 6)
4 = (2, 1)

When we split with the point 4 = (2, 1), the

$X$ list splits as: (it splits at $x_0 = 2$)

1 = (2, 9)       
4 = (2, 1)
----------  
3 = (3, 6)
0 = (4, 7) 
2 = (5, 4)

Clearly, the two sub-lists are still sorted by $x$

the $Y$ list splits at $x_0 = 2$

4 = (2, 1) 
1 = (2, 9)
----------
2 = (5, 4)
3 = (3, 6)
0 = (4, 7)

Notice that the two subsections of the $Y$ are still sorted with respect to the y coordinate, but they were split along the x coordinate.

Implementing this is exactly like implementing the reverse of the merge operation from merge sort.

Now that you have the $X$ and $Y$ sublists for the recursion, you can continue building the K-D tree.

Best Answer

Related Solutions

[Math] How to one grab a random node from a binary tree without flattening it

[Math] How to efficiently create balanced KD-Trees from a static set of points

Related Question