Statistical Analysis – Why Use Mean If It Is So Sensitive


It is a known fact that median is resistant to outliers. If that is the case, when and why would we use the mean in the first place?

One thing I can think of perhaps is to understand the presence of outliers i.e. if the median is far from the mean, then the distribution is skewed and perhaps the data needs to be examined to decide what is to be done with the outliers. Are there any other uses?

Best Answer

In a sense, the mean is used because it is sensitive to the data. If the distribution happens to be symmetric and the tails are about like the normal distribution, the mean is a very efficient summary of central tendency. The median, while being robust and well-defined for any continuous distribution, is only $\frac{2}{\pi}$ as efficient as the mean if the data happened to come from a normal distribution. It is this relative inefficiency of the median that keeps us from using it even more than we do. The relative inefficiency translates into a minor absolute inefficiency as the sample size gets large, so for large $n$ we can be more guilt-free about using the median.

It is interesting to note that for a measure of variation (spread, dispersion), there is a very robust estimator that is 0.98 as efficient as the standard deviation, namely Gini's mean difference. This is the mean absolute difference between any two observations. [You have to multiply the sample standard deviation by a constant to estimate the same quantity estimated by Gini's mean difference.] An efficient measure of central tendency is the Hodges-Lehmann estimator, i.e., the median of all pairwise means. We would use it more if its interpretation were simpler.