[Math] $O(n\log^2(n))$ algorithm for finding closest pair of points between two sets.

algorithmsdiscrete mathematics

For example, say that we are given a set of points where each point is labeled White or Black. How can we compute the pair of White, Black points with the smallest manhattan distance in $O(n\log^2(n))$?


So yeah, straight forward problem.. So brute-force would give us $O(n^2)$.. but how would one do this in just $O(n\log^2(n))$?


$O(n\log^2(n))$
Any thoughts or ideas would be great. I still have a ton to learn in this field!

Best Answer

First, consider the simplified problem: find a black point $x_b, y_b$ and a white point $x_w, y_w$ such that:

  1. $x_b \le x_w$
  2. $y_b \le y_w$
  3. The manhattan distance between the two points are minimum.

That is, we only consider pairs of point such that the white point is oriented on the top right of the black point. Because of properties (1) and (2), the manhattan distance between the two points become

$$|x_b - x_w| + |y_b - y_w| = x_w - x_b + y_w - y_b = (x_w + y_w) - (x_b + y_b)$$

This problem can be solved by sorting the points and sweeping them from left to right. Details is at the bottom of this explanation.

Thus, in order to find the closest pair of black and white point, we do the following $4$ times:

  1. Calculate the closest pair of black and white points according to the metric above
  2. Rotate the plane 90 degrees clockwise.

Sweeping details

Sort the points by their $x$ coordinates (if they are tied, then black point is sorted before white point). For each point $(x, y)$, we associate the value $(x+y)$ with it. Now, we sort the points according to its $x$ coordinates, and sweep these points from left to right. Now, initialize key-value data structure, and for each white point $(x, y)$, store $(y, (x+y))$ in this data structure (here $y$ is the key, and $(x+y)$ is the value).

Now, iterate the list of sorted points. Whenever we encounter a white point $(x, y)$, we remove it $(y, (x+y))$ from the data structure. Whenever we encounter a black point $(x_b, y_b)$, we have that the $x$ coordinates of all white points in the data structure is greater than $x_b$. Thus, in order to satisfy condition (2), we need to have $y_w \ge x_w$. Since we want to minimize the manhattan distance, i.e., $(x_w + y_w) - (x_b - y_b)$, we want to minimize $x_w + y_w$. This corresponds exactly to querying our key-value data structure for the minimum value such that the key is at least $y_w$.

If we use a data structure that supports the above operations in $O(\log n)$ time, we have an $O(n \log n)$ algorithm. An example of such data structure is the balanced binary tree.