Distributions – How to Understand the Relationship Between Mean and Median in Left Skewed Data?

distributionsmeanmedianself-studyskewness

I think the median $\leq$ mean.

Is this the case?

Best Answer

It's a nontrivial question (surely not as trivial as the people asking the question appear to think).

The difficulty is ultimately caused by the fact that we don't really know what we mean by 'skewness' - a lot of the time it's kind of obvious, but sometimes it really isn't. Given the difficulty in pinning down what we mean by 'location' and 'spread' in nontrivial cases (for example, the mean isn't always what we mean when we talk about location), it should be no great surprise that a more subtle concept like skewness is at least as slippery. So this leads us to try various algebraic definitions of what we mean, and they don't always agree with each other.

If you measure skewness by the second Pearson skewness coefficient, then the mean ($\mu$) will be less than the median ($\stackrel{\sim}{\mu}$ -- i.e. in this case you have it backwards).

The (population) second Pearson skewness is $$\frac{3(\mu-\stackrel{\sim}{\mu})}{\sigma}\,,$$ and will be negative ("left skew") when $\mu<\stackrel{\sim}{\mu}$.

The sample versions of these statistics work similarly.

The reason for the necessary relationship between mean and median in this case is because that's how the skewness measure is defined.

Here's a left-skewed density (by both the second Pearson measure and the more common measure in (2) below):

enter image description here

The median is marked in the lower margin in green, the mean in red.

So I expect the answer they want you to give is that the mean is less than the median. It's usually the case with the sorts of distributions we tend to give names to.

(But read on, and see why that's not actually correct as a general statement.)

If you measure it by the more usual standardized third moment, then it is often, but by no means always, the case that the mean will be less than the median.

That is, it's possible to construct examples where the opposite is true, or where one skewness measure is zero while the other is non-zero.

Which is to say, there's no necessary relationship between the locations of the mean, median and the moment-skewness.

Consider, for example, the following sample (the same example can be constructed as a discrete probability distribution):

  2.7 15.0 15.0 15.0 30.0 30.0

mean: 17.95
median: 15

The mean is larger than the median, yet the third-moment skewness coefficient is negative (i.e. by its lights, we have left-skew data) since the sum of the cubes of the deviations from the mean is negative.

So in that sense, left-skew, but mean>median.

(On the other hand, if you change 2.7 in the above example to 3, then you have an example where the moment-skewness is zero, yet the mean exceeds the median. If you make it 3.3, then the moment-skewness is positive, and the mean exceeds the median - i.e. is finally in the 'anticipated' direction.)

If you use the first Pearson skewness instead of either of the above definitions, you have a similar issue to this case - the direction of the skewness does not pin down the relation between mean and median in general.

Edit: in answer to a question in comments -- an example where the mean and median are equal, but the moment-skewness is negative. Consider the following data (as before, it also counts as an example for a discrete population; consider writing the numbers on the faces of a die).

 1  5  6  6  8 10

the mean and the median are both 6, but the sum of cubes of deviations from the mean are negative, so the third moment skewness is negative.

Best Answer

Related Solutions

Solved – When does the amount of skew or prevalence of outliers make the median preferable to the mean

Framing the question

General thoughts

Distributions – Why the Arithmetic Mean is Greater Than the Median on a Right-Skewed Histogram

Related Question