Suppose I have $n$ data points $x_1,\dots,x_n$, each of which is $p$-dimensional. Let $\Sigma$ be the (non-singular) population covariance of these samples. With respect to $\Sigma$, what is the most efficient way known to compute the vector of squared Mahalanobis distances (from $\vec 0$) of the n data points.
That is we would like to compute the vector $(x_1^T\Sigma^{-1}x_1,\dots,x_n^T\Sigma^{-1}x_n)$.
Computing the inverse $\Sigma^{-1}$ seems to be quite slow for large matrices. Is there a faster way?
Best Answer
Let $x$ be one of your data points.
Compute the Cholesky decomposition $\Sigma=LL^\top$.
Define $y=L^{-1}x$.
Compute $y$ by forward-substitution in $Ly=x$.
The Mahalanobis distance to the origin is the squared euclidean norm of $y$:
$$ \begin{align} x^\top\Sigma^{-1}x &= x^\top(LL^\top)^{-1}x \\ &= x^\top(L^\top)^{-1}L^{-1}x \\ &= x^\top(L^{-1})^\top L^{-1}x \\ &= (L^{-1}x)^\top(L^{-1}x) \\ &= \|y\|^2. \end{align} $$