I have a cluster of p-dimensional points and given a new p-dimensional point $x$ I want to determine whether or not it is likely to belong to this cluster.
The cluster is made up of $n$ p-dimensional points, I am making the assumption that these points are drawn from a multivariate Gaussian distribution with sample mean ${\hat{\mu_X}}$ and sample covariance matrix $\hat{\sigma_X}$.
Given a new point $x$ I am trying to decide if it is likely to belong to this cluster using the following threshold on the Mahalanobis distance:
$$\frac{n}{(n-1)^2}\left(X_i-\hat{\mu}_X\right)'\hat{\sigma}_X^{-1}\left(X_i-\hat{\mu}_X\right)>B_{0.95}\left(\frac{p}{2},\frac{n-p-1}{2}\right)$$
However, some clusters have very few sample points $n$ in which case calculating the inverse covariance matrix $\hat{\sigma_X}^{-1}$ becomes impossible.
Are there any other equivalent or more appropriate measure I can use in this case?
Best Answer
You can compute the pseudo-Mahalanobis distance by using the pseudo-inverse:
$$W_i'\hat{\sigma}_W^{-1}W_i$$
where
$$W^*=\left(X_i-\hat{\mu}_X\right)$$
and
$$W=W^*V_{W^*}$$
where
$V_{W^*}$ is the $V$ matrix of the SVD decomposition of $W^*$
Then simply compute the Mahalanobis distances on the matrix $W$ (instead of $X$).
note that you need to replace $p$ in the left hand side of your inequality by $p^*$ (the rank of $V_{W^*}$)