Outliers Detection – Best Methods to Identify Outliers in Multivariate Data

multivariate analysisoutliers

Suppose I have a large set of multivariate data with at least three variables. How can I find the outliers? Pairwise scatterplots won't work as it is possible for an outlier to exist in 3 dimensions that is not an outlier in any of the 2 dimensional subspaces.

I am not thinking of a regression problem, but of true multivariate data. So answers involving robust regression or computing leverage are not helpful.

One possibility would be to compute the principal component scores and look for an outlier in the bivariate scatterplot of the first two scores. Would that be guaranteed to work? Are there better approaches?

Best Answer

Have a look at the mvoutlier package which relies on ordered robust mahalanobis distances, as suggested by @drknexus.

Related Question