Is nearest point to $\mu$ also nearest to all other points

algorithmsanalysisoptimizationreal-analysisstatistics

Suppose a set of points $\mathcal{X} = \{ x_1, …, x_n\} \subset \mathbb{R}^p$. Define

$$\mu = \frac1n \sum_i x_i$$

We know that $\mu$ is the point which minimizes (over $\mathbb{R}^p$) the sum of squared distances between it and all other points in $\mathcal{X}$.

Is it true that the point in $\mathcal{X}$ that is closest to $\mu$ is also the point in $\mathcal{X}$ that is closest to all other points in $\mathcal{X}$? In other words what I am asking is whether the following equality holds:

$$\arg\min_{x_i \in \mathcal{X}} \| \mu – x_i \|_2^2 = \arg\min_{x_j \in \mathcal{X}} \sum_i \| x_j – x_i \|_2^2$$

How can this be shown? I’m finding it difficult to write a proof due to the fact that $\mathcal{X}$ is a set of points (non-convex, disconnected, countable, etc)

If yes, does this still hold when we use other $p$-norms in place of $\ell_2$ to compute distance?

Best Answer

I will switch to $x^1, \dots, x^n$ as the notation for $\mathcal X$, so that I can use subscripts for components of a vector: $x^i = (x^i_1, x^i_2, \dots, x^i_p)$.

Let's begin by understanding the function $$ f(x) = \sum_{i=1}^n \|x - x^i\|_2^2 = \sum_{i=1}^n \sum_{j=1}^p (x_j - x^i_j)^2. $$ Switching the order of summation and taking the $j^{\text{th}}$ term of the sum over $j$, we get $\sum_{i=1}^n (x_j - x^i_j)^2$. This is a quadratic equation in $x_j$ where the coefficient of $x_j^2$ is $n$, so it can be written as $n(x_j - h_j)^2 + k_j$ for some $h_j, k_j \in \mathbb R$. Summing these together again, we get $$ f(x) = \sum_{j=1}^p n(x_j - h_j)^2 + k_j = n \|x - h\|_2^2 + \sum_{j=1}^p k_j. $$ In other words, $f(x)$ is $n$ times the distance to some point $h$, plus a constant.

By the characterization of $\mu$ as the point minimizing $f(x)$, we know that actually $h = \mu$. This tells us what the additive constant must be: it is $f(\mu)$ (the sum of distance from $\mu$ to the points in $\mathcal X$). We conclude that $$ f(x) = n\|x - \mu\|_2^2 + f(\mu). $$ The point $x^i \in \mathcal X$ closest to all other points in $\mathcal X$ is the point $x^i$ for which $f(x^i)$ is the smallest. We see now that it must be the point for which $\|x^i - \mu\|_2$ is the smallest, therefore your equality holds.


Regarding the possibility of the same thing happening with other norms: the key thing that makes this work for $\ell_2$ is the fact that the sum of squared distance functions is a multiple of a distance function. (Plus a constant.) In the simplest case, for two points $y, z$, we have $$ \|x - y\|_2^2 + \|x - z\|_2^2 = 2\|x - \tfrac{y+z}{2}\|_2^2 + \frac12\|y-z\|_2^2. $$ This lets us rewrite the function $f(x)$ above in terms of distance from the mean.

It is easy to check that the same "coincidence" does not occur for other norms, so when we work with a different norm, the function $f(x)$ will have a more complicated shape: it will not solely depend on the distance from some center point.

When $\mathcal X$ is very large, adding a new point or two will not change the shape of $f(x)$ very much. So we can add two new points $x^{n+1}, x^{n+2}$ with the following properties:

  • $x^{n+1}$ is the new closest point to what we computed the center point to be.
  • When we draw two contour lines through $x^{n+1}$ - the contour line of $f(x)$, and of distance to the center point - $x^{n+2}$ is inside the first and outside the second.

Then the first argmin in the question will be $x^{n+1}$, and the second will be $x^{n+2}$; this outlines how to come up with a counterexample for other norms.

Related Question