[Math] Maximise difference between mean and median

algorithmsaverageelementary-set-theorymeansmedian

I have a set of numbers and I want to maximise the difference between the mean and the median by removing a given number of elements.

For instance I have the following set:

0 5 10 45
Mean = 15
Median = 7.5
Mean - Median = 7.5

If we choose to remove 1 element I want to obtain the following

0 5 45
Mean = 16.66
Median = 5
Mean - Median = 11.66 => MAXIMISED

We can assume that the set is sorted.

Is there an algorithm for doing this?

Best Answer

This is NOT a complete answer, but I try to sketch my idea.

Suppose we have a sorted set, say, in an increasing order. If we want to maximise the difference between mean and median, it is necessary (but not sufficient) to keep as many outliers as possible (because mean is very much affected by these outliers, whereas median is not). The problems here are: how many is "many"? and how to control the difference?

Suppose $x_1 < x_2 < ... < x_n$. We can calculate its mean and median. My alogorithm (which I've not yet been able to prove so far) is as follows:

  1. If mean $>$ meadian: we delete the median or the closest number which is BIGGER than median, that is $x_{(n+1)/2}$, if $n $ is odd or $x_{n/2+1}$, if $n$ is even. The effect of doing this is: after each step, the mean (not always) gets bigger and bigger but the median (not always) gets smaller and smaller (and of course, mean is always bigger than median in every step). But overall, the difference gets bigger.
  2. If mean $<$ median: we delete the median or the closest number which is SMALLER than median, that is $x_{(n+1)/2}$, if $n $ is odd or $x_{n/2-1}$, if $n$ is even. The effect of doing this is: after each step, the mean (not always) gets smaller and smaller but the median (not always) gets bigger and bigger (and of course, mean is always smaller than median in every step). But overall, the difference gets bigger.

It turns out that, finally we will only have $3$ numbers left (can't be $2$, otherwise mean $=$ median): the biggest, the smallest, and the second smallest (if mean $>$ median) or the second biggest (if mean $<$ median). These $3$ numbers will have the maximum difference between mean and median.