Solved – Dealing with outliers when comparing variances with Bartlett’s test

anovaheteroscedasticityoutliersrobustwinsorizing

I have four different groups (with unequal sample sizes of 100 to 120) and want to test if the variance differs.
For ANOVAs I used the winsorized mean to get a more robust estimate and I am wondering if it is recommended to use the winsorized data (all values extremer than the 5th or 95th percentile are set to the nearest percentile) when testing whether the variances are equal. (e.g. with a Levene test).
Or would you work with a different kind of outlier removal?

I checked the normality assumptions of the groups and saw that it stays approximately the same (some slight violations of normality) when I compare the tests of the original data with the tests of the winsorized one.

Best Answer

There is a great deal of disagreement over good statistical style here, and indeed most of elsewhere.

But this strikes me as a mishmash of quite different procedures.

No tests for differing variances will work as designed if you Winsorize the data first. Perhaps someone has worked on this -- you might find literature references with modified tests -- but otherwise you are using a combination procedure with unknown properties. This is like doping a horse or a cyclist with something that boosts speed; you can't change the performance and be clear how much difference was yielded by the dope. You can't Winsorize and expect the tests to perform about the same. In your case, Winsorizing 5% in each tail is major surgery!

I can't speak for any statistical people but myself but outlier removal because extreme points are awkward strikes me as very poor practice.

More generally, it is now 60 years since the recently departed George Box showed that these preliminary tests are more fragile than tests comparing means, which is presumably is your main focus. I doubt I am the only one who prefers a more informal approach.

  1. Plot the data and consider summary statistics.

  2. If variance appears very different, considering working on a transformed scale. Logs or roots often improve the approximation to conditional normal distributions too.

  3. Proceed to ANOVA, or if desired a generalised linear model with appropriate link function.

  4. Apply some sensitivity analysis, e.g. ANOVA on raw data and on transformed data, to see how much difference that makes. Set aside the idea that there is one correct analysis to be identified which some oracle will reveal.

Related Question