Solved – Efficient/fast Mahalanobis distance computation

computational-statisticsdistance

Suppose I have $n$ data points $x_1,\dots,x_n$, each of which is $p$-dimensional. Let $\Sigma$ be the (non-singular) population covariance of these samples. With respect to $\Sigma$, what is the most efficient way known to compute the vector of squared Mahalanobis distances (from $\vec 0$) of the n data points.

That is we would like to compute the vector $(x_1^T\Sigma^{-1}x_1,\dots,x_n^T\Sigma^{-1}x_n)$.

Computing the inverse $\Sigma^{-1}$ seems to be quite slow for large matrices. Is there a faster way?

Best Answer

  1. Let $x$ be one of your data points.

  2. Compute the Cholesky decomposition $\Sigma=LL^\top$.

  3. Define $y=L^{-1}x$.

  4. Compute $y$ by forward-substitution in $Ly=x$.

  5. The Mahalanobis distance to the origin is the squared euclidean norm of $y$:

$$ \begin{align} x^\top\Sigma^{-1}x &= x^\top(LL^\top)^{-1}x \\ &= x^\top(L^\top)^{-1}L^{-1}x \\ &= x^\top(L^{-1})^\top L^{-1}x \\ &= (L^{-1}x)^\top(L^{-1}x) \\ &= \|y\|^2. \end{align} $$