Given the formula of Mahalanobis Distance:
$D^2_M = (\mathbf{x} – \mathbf{\mu})^T \mathbf{S}^{-1} (\mathbf{x} – \mathbf{\mu})$
If I simplify the above expression using Eigen-value decomposition (EVD) of the Covariance Matrix:
$S = \mathbf{P} \Lambda \mathbf{P}^T$
Then,
$D^2_M = (\mathbf{x} – \mathbf{\mu})^T \mathbf{P} \Lambda^{-1} \mathbf{P}^T (\mathbf{x} – \mathbf{\mu})$
Let, the projections of $(\mathbf{x}-\mu)$ on all eigen-vectors present in $\mathbf{P}$ be $\mathbf{b}$, then:
$\mathbf{b} = \mathbf{P}^T(\mathbf{x} – \mathbf{\mu})$
And,
$D^2_M = \mathbf{b}^T \Lambda^{-1} \mathbf{b}$
$D^2_M = \sum_i{\frac{b^2_i}{\lambda_i}}$
The problem that I am facing right now is as follows:
The covariance matrix $\mathbf{S}$ is calculated on a dataset, in which no. of observations are less than the no. of variables. This causes some zero-valued eigen-values after EVD of $\mathbf{S}$.
In these cases the above simplified expression does not result in the same Mahalanobis Distance as the original expression, i.e.:
$(\mathbf{x} – \mathbf{\mu})^T \mathbf{S}^{-1} (\mathbf{x} – \mathbf{\mu}) \neq \sum_i{\frac{b^2_i}{\lambda_i}}$ (for non-zero $\lambda_i$)
My question is: Does the simplified expression still functionally represent the Mahalanobis Distance?
P.S.: Motivation to use the simplified expression of Mahalanbis Distance is to calculate its gradient wrt $b$.
Best Answer
As indicated in Erick's comment, your problem is not that the two calculations yield different results for a singular covariance matrix, but that $\mathbf S$ is singular (and hence not invertible) if some of the eigenvalues are zero, so that neither calculation is well-defined. This is a conceptual problem, not a computational one; the Mahalanobis distance is simply not well-defined in this case.
This paper suggests what it calls a regularized Mahalanobis distance to deal with this problem.