Solved – Outlier removal for univariate and multivariate analysis

multivariate analysisoutliersunivariate

I have a biological data set on which I would like to do both univariate and multivariate analysis, and try to find correlation of features to a response. Should I remove univariate outliers and do univariate analysis, and remove multivariate outliers and then do multivariate analysis separately? Or should I remove both univariate and multivariate outliers and then do the analysis on the remaining data set.

Best Answer

As a first approach, I usually follow the steps described in Zuur et al (2010) A protocol for data exploration to avoid common statistical problems. This will help you identify outliers for univariate and multivariate analyses.

To answer your question, I would say that from my experience, an outlier for a univariate analysis is also usually an outlier for a multivariate analysis. However, multivariate analysis assumptions are more "relax" than in univariate analysis. For example, if you do a redundancy analysis (RDA) you basically have to make sure your explanatory variables are not highly correlated before your RDA, and look for multi-collinearity and make sure you meet the homogeneity of dispersion assumption on your RDA model. So at the end, the effect of an outlier might not be as pronounced in a multivariate analysis.

In any analysis, decision to remove data should be taken after you run your analysis on the full data and you see that you don't meet the assumptions because of the outlier(s).