Solved – Using percentiles and inter-quartile-range for outlier detection in skewed data

distributionsmathematical-statisticsoutliersprobability

I am analyzing the age of a certain group of people and I want to use percentiles and inter-quartile-range in the data to flag possible outliers. I am getting Q1 – 25th percentile, Q3 – 75th percentile, and inter-quartile-range, IQR = Q3 – Q1. Using the convention in analyzing a boxplot, I am going to use Q1 – 1.5IQR and Q3 + 1.5IQR as the threshold for outliers.

My question is: if the age does not follow normal distribution and is skewed, is using percentiles and IQR to detect outliers still meaningful?

If not, what would be a better approach?

Best Answer

The boxplot "rule" (or rather rules, since Tukey had two) was (at least in a sense) 'calibrated' to the normal distribution, but was not intended as a method to explicitly identify outliers, but rather as a way of identifying points for further investigation. See Nick Cox's comment here

With skewed distributions, you'll tend to expect more observations being marked at one end than the other.

In some cases it may not even make much sense to worry about outliers at one end of a distribution.

Ultimately, what it might be meaningful to do will depend on what you're marking the points for. Why identify outliers? What's the impact of an unusually large outlier? What's the impact of an unusually small one?