[Math] Derivation of Mahalanobis Distance

linear algebramultivariable-calculus

I was recently reading up on the Mahalanobis Distance, and understood how it generalizes distance measures for multivariate data such as the Euclidean Distance. However, what got me wondering was how does one derive the formula constructively? I understand how this general form applies in particular cases, but is there a way to construct this general formula from 'first principles'?

Particularly, could someone explain what is the significance of the inverted covariance matrix? (especially in the non-trivial case when it is not a diagonal matrix)

Best Answer

More than "derive", I would talk about "why". For some explanations have a look at my answers here and here. For a direct application consider for example the $p$-multivariate normal distribution

$$f(x|\theta, \Sigma)=\frac{1}{(2\pi)^\frac{p}{2}|\Sigma|^\frac{p}{2}} \exp\left( -\frac{1}{2}\langle (x-\theta),\Sigma^{-1}(x-\theta)\rangle \right);$$

the exponent is (up to a factor $-\frac{1}{2}$) the square Mahalanobis distance of $x$ from the mean $\theta$. This is an example of kernel (gaussian); it is widely used in density estimation. In the bivariate case, the level curves / density contours

$$\langle (x-\theta),\Sigma^{-1}(x-\theta)\rangle = K $$

are ellipses, with the usual statistical / mathematical interpretation.

In more mathematical terms, the squared Mahalanobis distance is an example of Bregman divergence generated by the convex function $F(x)=\frac{1}{2}\langle x,\Sigma^{-1}x\rangle$. In the regression context, it is also related to leverage; I refer to specialized texts for more details.

  • References

On Bregman divergences and Mahalanobis distance (with applications in topology): http://arxiv.org/pdf/0709.2196v1.pdf

On the geoemtry induced by the divergence with generator given by $F(x)=\frac{1}{2}\langle x,\Sigma^{-1}x\rangle$ (pag. 8-9 in particular): http://bulletin.pan.pl/(58-1)183.pdf

This second reference shows that the Mahalanobis distance induces a Riemannian geometry structure on a certain manifold with curvature tensor induced by the positive definite matrix $\Sigma^{-1}$. This is nice.