Suppose I have a large set of multivariate data with at least three variables. How can I find the outliers? Pairwise scatterplots won't work as it is possible for an outlier to exist in 3 dimensions that is not an outlier in any of the 2 dimensional subspaces.
I am not thinking of a regression problem, but of true multivariate data. So answers involving robust regression or computing leverage are not helpful.
One possibility would be to compute the principal component scores and look for an outlier in the bivariate scatterplot of the first two scores. Would that be guaranteed to work? Are there better approaches?
Best Answer
Have a look at the mvoutlier package which relies on ordered robust mahalanobis distances, as suggested by @drknexus.