I'm a programmer and am working on writing an efficient algorithm that, given a point P
in n-dimensional space, can find the closest point from a set of points. For my problem, each of the n
axis in the n-dimensional is constrained by a>=0
and a<=100
, where a
is the axis. There are n
points in the set, each one placed at 100 on one axis and 0 on all of the others. For example, if n=7 (7-dimentional), this is the set of points:
set = {(100,0,0,0,0,0,0), (0,100,0,0,0,0,0), (0,0,100,0,0,0,0), (0,0,0,100,0,0,0), (0,0,0,0,100,0,0), (0,0,0,0,0,100,0), (0,0,0,0,0,0,100)}
For the purpose of example, let's say the given point is: P = (0,91,16,0,0,70,0)
I need to find the point in set
that is closest to P
. Of course, I could use n-dimensional Euclidean distance formula between P
and each point in set
, however as a programmer, I'm very concerned with efficiency.
Here's my hypothesis:
$\forall a \in P, if a=0 \bigwedge set[positionOf(a)][positionOf(a)]=100,$ then $set[positionOf(a)]$ is not the closest point to $P$.
Note: $positionOf(x)$ means the index of $x$ in the parent array and $array[x]$ means the element in $array$ at position (index) $x$. Also note, in order for this to work, set
must be ordered how it is above.
In other words: let's say there's a zero at position 4 in our P
(like in our example: (0,91,16,0,0,70,0)), then the point within set
where the number at position 4 equals 100 (and all other numbers are zero) cannot be the closest point to P
.
I realize that P
cannot equal the origin (all zeros) in order for this hypothesis to hold up. How could I proved this? Does this hold true for all cases?
Sorry if I did not explain well, I realize it's rather abstract. In my actual algorithm, there are many more dimensions than 7, and many of the numbers within the coordinates are equal to zero—thus I could eliminate them from my computation if this hypothesis is correct.
Best Answer
Your hypothesis is correct (except at the origin). I doubt, in general, that this will drastically improve your algorithm's efficiency, though. I could be wrong...
Here's an improved algorithm:
Let $i$ be the index for which $P_i$ is maximal. Then $e_i$ (the $i$th vector in your set, ordered the way you wrote it) is the closest point.
Proof: Compute the euclidean distance, take a derivative or two, and you're done.
There. That's linear time in the number of dimensions. Better than the $O(n^2)$ needed to compute all the Euclidean distances.
By the way, this all assumes that the query-point $P$ also satisfies $0 \le p_i \le 100$ for all $i$, the same way the points in your "set" do. I wasn't quite able to parse your question in a way that made me sure that this was what you were asking.