[Math] Finding the closest point in a set to another point in n-dimensional space: efficiently

euclidean-geometrygeneral-topologygeometrylinear algebra

I'm a programmer and am working on writing an efficient algorithm that, given a point P in n-dimensional space, can find the closest point from a set of points. For my problem, each of the n axis in the n-dimensional is constrained by a>=0 and a<=100, where a is the axis. There are n points in the set, each one placed at 100 on one axis and 0 on all of the others. For example, if n=7 (7-dimentional), this is the set of points:

set = {(100,0,0,0,0,0,0), (0,100,0,0,0,0,0), (0,0,100,0,0,0,0), (0,0,0,100,0,0,0), (0,0,0,0,100,0,0), (0,0,0,0,0,100,0), (0,0,0,0,0,0,100)}

For the purpose of example, let's say the given point is: P = (0,91,16,0,0,70,0)

I need to find the point in set that is closest to P. Of course, I could use n-dimensional Euclidean distance formula between P and each point in set, however as a programmer, I'm very concerned with efficiency.

Here's my hypothesis:

$\forall a \in P, if a=0 \bigwedge set[positionOf(a)][positionOf(a)]=100,$ then $set[positionOf(a)]$ is not the closest point to $P$.

Note: $positionOf(x)$ means the index of $x$ in the parent array and $array[x]$ means the element in $array$ at position (index) $x$. Also note, in order for this to work, set must be ordered how it is above.

In other words: let's say there's a zero at position 4 in our P (like in our example: (0,91,16,0,0,70,0)), then the point within set where the number at position 4 equals 100 (and all other numbers are zero) cannot be the closest point to P.

I realize that P cannot equal the origin (all zeros) in order for this hypothesis to hold up. How could I proved this? Does this hold true for all cases?

Sorry if I did not explain well, I realize it's rather abstract. In my actual algorithm, there are many more dimensions than 7, and many of the numbers within the coordinates are equal to zero—thus I could eliminate them from my computation if this hypothesis is correct.

Best Answer

Your hypothesis is correct (except at the origin). I doubt, in general, that this will drastically improve your algorithm's efficiency, though. I could be wrong...

Here's an improved algorithm:

Let $i$ be the index for which $P_i$ is maximal. Then $e_i$ (the $i$th vector in your set, ordered the way you wrote it) is the closest point.

Proof: Compute the euclidean distance, take a derivative or two, and you're done.

There. That's linear time in the number of dimensions. Better than the $O(n^2)$ needed to compute all the Euclidean distances.

By the way, this all assumes that the query-point $P$ also satisfies $0 \le p_i \le 100$ for all $i$, the same way the points in your "set" do. I wasn't quite able to parse your question in a way that made me sure that this was what you were asking.

Related Question