I have a set of numbers and I want to maximise the difference between the mean and the median by removing a given number of elements.
For instance I have the following set:
0 5 10 45
Mean = 15
Median = 7.5
Mean - Median = 7.5
If we choose to remove 1 element I want to obtain the following
0 5 45
Mean = 16.66
Median = 5
Mean - Median = 11.66 => MAXIMISED
We can assume that the set is sorted.
Is there an algorithm for doing this?
Best Answer
This is NOT a complete answer, but I try to sketch my idea.
Suppose we have a sorted set, say, in an increasing order. If we want to maximise the difference between mean and median, it is necessary (but not sufficient) to keep as many outliers as possible (because mean is very much affected by these outliers, whereas median is not). The problems here are: how many is "many"? and how to control the difference?
Suppose $x_1 < x_2 < ... < x_n$. We can calculate its mean and median. My alogorithm (which I've not yet been able to prove so far) is as follows:
It turns out that, finally we will only have $3$ numbers left (can't be $2$, otherwise mean $=$ median): the biggest, the smallest, and the second smallest (if mean $>$ median) or the second biggest (if mean $<$ median). These $3$ numbers will have the maximum difference between mean and median.