Solved – What to do with many multivariate outliers

multivariate analysisoutliersregression

I'm doing a multiple regression with 5 continuous predictors and 1 continuous outcome variable. I've already removed a small handful of univariate outliers (n = 5), leaving my total sample size at N = 95.

However, when I run my regression, I end up identifying many multivariate outliers that exceed the Mahalanobis distance criteria. Specifically, I find 11 cases with a Mahalanobis distance score above the cut-off of 11.07, with 5 predictors and significance at .05. I've gone through and can't see any errors in the data, nor are any of the cases severely deviating on any of the variables. What should I do? Surely I can't delete over 10% of my data?

Best Answer

You probably shouldn't have deleted any observations, certainly not simply because they were outliers. Instead, you can either use a method that is OK with outliers (e.g. quantile regression, robust regression, tree models) or transform the variables (if that is sensible in your case).

Related Question