Solved – A robust (non-parametric) measure like Coefficient of Variation — IQR/median, or alternative

descriptive statisticsmathematical-statisticsnonparametricnormalizationstandard deviation

For a given set of data, spread is often calculated either as the standard deviation or as the IQR (inter-quartile range).

Whereas a standard deviation is normalised (z-scores, etc.) and so can be used to compare the spread from two different populations, this is not the case with the IQR since the samples from two different populations could have values at two quite different scales,

 e.g. 
 Pop A:  100, 67, 89, 75, 120, ...
 Pop B:  19, 22, 43, 8, 12, ...

What I'm after is a robust (non-parametric) measure that I can use to compare the variation within different populations.

Choice 1:
IQR / Median — this would be by analogy to the coefficient of variation, i.e. to $
\frac{\sigma}{\mu}$.

Choice 2:
Range / IQR

Question: Which is the more meaningful measure for comparing variation between populations? And if it is Choice 1, is Choice 2 useful for anything / meaningful, or is it a fundamentally flawed measure?

Best Answer

The question implies that the standard deviation (SD) is somehow normalized so can be used to compare the variability of two different populations. Not so. As Peter and John said, this normalization is done as when calculating the coefficient of variation (CV), which equals SD/Mean. The SD is in in the same units as the original data. In contrast, the CV is a unitless ratio.

Your choice 1 (IQR/Median) is analogous to the CV. Like the CV, it would only make sense when the data are ratio data. This means that zero is really zero. A weight of zero is no weight. A length of zero is no length. As a counter example, it would not make sense for temperature in C or F, as zero degrees temperature (C or F) does not mean there is no temperature. Simply switching between using C or F scale would give you a different value for the CV or for the ratio of IQR/Median, which makes both those ratios meaningless.

I agree with Peter and John that your second idea (Range/IQR) would not be very robust to outliers, so probably wouldn't be useful.

Related Question