Solved – Why use the Mahalanobis distance

distancedistance-functions

I understand in theory why the Mahalanobis distance is a good measure for mutlivariate outlier detection. However, everything I tend to read warns against calculating the inverse/pseudoinverse of a covariance matrix, which is needed to compute the mahalanobis distance.

So, if nobody wants to compute the inverse, what distance measure should be used?

Best Answer

One of the issues is that all the variables could be on different scales. Suppose that two of your variables are income and gender, the former in dollars and the latter as a 0-1 indicator variable. Which is further away: being off by 1 unit in income or 1 unit in gender? Being just a dollar away is pretty good, but having the wrong sex is as far away as you can get. You need to normalize how far these distances are; standard Euclidean distance doesn't do this. The variance-covariance matrix rescales the variables to make distances more comparable.

Related Question