The only difference is in the "merging" part of the recursion. Let the two sets created by the dividing line be called as $L$ and $R$, respectively (for left and right sets). Via recursion, we have our temporary current closest pair -- let's say its at distance $M$. Similar to the original algorithm, we filter only the points that is within $M$ distance from the dividing line, sort the points by their $y$ coordinates, and sweep the points from top to bottom. For each point $(x', y')$ in $L$, consider the square of size $2M \times 2M$ centered at this point. How many points in $R$ within $M$ distance from the dividing line can be inside this square at most? $6$ points (I'm being ultra conservative here), and those corresponds to the points in $R$ whose $y$ coordinate is between $y' - M'$ and $y' + M$. Thus, a similar reasoning holds as in the original algorithm here.
First, consider the simplified problem: find a black point $x_b, y_b$ and a white point $x_w, y_w$ such that:
- $x_b \le x_w$
- $y_b \le y_w$
- The manhattan distance between the two points are minimum.
That is, we only consider pairs of point such that the white point is oriented on the top right of the black point. Because of properties (1) and (2), the manhattan distance between the two points become
$$|x_b - x_w| + |y_b - y_w| = x_w - x_b + y_w - y_b = (x_w + y_w) - (x_b + y_b)$$
This problem can be solved by sorting the points and sweeping them from left to right. Details is at the bottom of this explanation.
Thus, in order to find the closest pair of black and white point, we do the following $4$ times:
- Calculate the closest pair of black and white points according to the metric above
- Rotate the plane 90 degrees clockwise.
Sweeping details
Sort the points by their $x$ coordinates (if they are tied, then black point is sorted before white point). For each point $(x, y)$, we associate the value $(x+y)$ with it. Now, we sort the points according to its $x$ coordinates, and sweep these points from left to right. Now, initialize key-value data structure, and for each white point $(x, y)$, store $(y, (x+y))$ in this data structure (here $y$ is the key, and $(x+y)$ is the value).
Now, iterate the list of sorted points.
Whenever we encounter a white point $(x, y)$, we remove it $(y, (x+y))$ from the data structure. Whenever we encounter a black point $(x_b, y_b)$, we have that the $x$ coordinates of all white points in the data structure is greater than $x_b$. Thus, in order to satisfy condition (2), we need to have $y_w \ge x_w$. Since we want to minimize the manhattan distance, i.e., $(x_w + y_w) - (x_b - y_b)$, we want to minimize $x_w + y_w$. This corresponds exactly to querying our key-value data structure for the minimum value such that the key is at least $y_w$.
If we use a data structure that supports the above operations in $O(\log n)$ time, we have an $O(n \log n)$ algorithm. An example of such data structure is the balanced binary tree.
Best Answer
Your guess is indeed correct! If you're interested in proving correctness, use a proof by contradiction and apply the triangle inequality.
Now let's come up with an algorithm that will find the black point $(x_0, y_0)$ in the unordered set $B$ of all black points such that $(x_0, y_0)$ is closest to $(f,g)$. This is basically a linear search. In pseudocode (Python), we can do something like:
Use a similar function to find the closest white point. Then since all $n$ points are scanned exactly once, the total running time will be $O(n)$, as desired.