I was just introduced to the Mahalanobis distance between two vectors $\mathrm{\mathbf{X}}$ and $\mathrm{\mathbf{Y}}$ of random variables:
$$|| \mathrm{\mathbf{X}} – \mathrm{\mathbf{Y}}||_{\Sigma} = ((\mathrm{\mathbf{X}} – \mathrm{\mathbf{Y}})^T \Sigma^{-1}(\mathrm{\mathbf{X}} – \mathrm{\mathbf{Y}}))^{1/2},$$
where $\Sigma$ is the covariance matrix.
As I understand it, the 4 properties that a function $d(x,y)$ must satisfy in order to be a metric are as follows:
- $d(x, y) \ge 0$
- $d(x, y) = 0 \Longleftrightarrow x = y$
- $d(x, y) = d(y, x)$
- $d(x, z) \le d(x, y) + d(y, z)$
I only have an introductory-level knowledge of statistics, so I'm wondering how it is that the Mahalanobis distance satisfies property 1? Ignoring the square root, why is it that $(\mathrm{\mathbf{X}} – \mathrm{\mathbf{Y}})^T \Sigma^{-1}(\mathrm{\mathbf{X}} – \mathrm{\mathbf{Y}})$ can't be negative?
I would greatly appreciate it if people could please take the time to clarify this.
Best Answer
This is because the $\Sigma^{-1}$ matrix (inverse of the covariance matrix) is symmetric definite positive.
Once that you have a symmetric positive definite (SPD) matrix $S$, it is easy to define:
Other references: