Multivariate Gaussian Distance – Ways to Measure Using Mahalanobis Distance

clusteringcovariancecovariance-matrixmatrix inversemultivariate analysis

I have a cluster of p-dimensional points and given a new p-dimensional point $x$ I want to determine whether or not it is likely to belong to this cluster.

The cluster is made up of $n$ p-dimensional points, I am making the assumption that these points are drawn from a multivariate Gaussian distribution with sample mean ${\hat{\mu_X}}$ and sample covariance matrix $\hat{\sigma_X}$.

Given a new point $x$ I am trying to decide if it is likely to belong to this cluster using the following threshold on the Mahalanobis distance:
$$\frac{n}{(n-1)^2}\left(X_i-\hat{\mu}_X\right)'\hat{\sigma}_X^{-1}\left(X_i-\hat{\mu}_X\right)>B_{0.95}\left(\frac{p}{2},\frac{n-p-1}{2}\right)$$

However, some clusters have very few sample points $n$ in which case calculating the inverse covariance matrix $\hat{\sigma_X}^{-1}$ becomes impossible.

Are there any other equivalent or more appropriate measure I can use in this case?

Best Answer

You can compute the pseudo-Mahalanobis distance by using the pseudo-inverse:

$$W_i'\hat{\sigma}_W^{-1}W_i$$

where

$$W^*=\left(X_i-\hat{\mu}_X\right)$$

and

$$W=W^*V_{W^*}$$

where

$V_{W^*}$ is the $V$ matrix of the SVD decomposition of $W^*$

Then simply compute the Mahalanobis distances on the matrix $W$ (instead of $X$).

note that you need to replace $p$ in the left hand side of your inequality by $p^*$ (the rank of $V_{W^*}$)

Related Question