Non-Normal Distribution – How Statisticians Make Better Guesses than Using Mean Alone

gamesmeansufficient-statistics

Let's say we have a game with two players. Both of them know that five samples are drawn from some distribution (not normal). None of them know the parameters of the distribution used to generate the data. The goal of the game is to estimate the mean of the distribution. The player that comes closer to the true mean wins 1\$ (absolute difference between estimated value and actual value is the objective function). If the distribution has a mean that blows up to $\infty$, the player guessing the larger number wins and for $-\infty$, the one guessing the smaller number.

While the first player is given all five samples, the second one is given just the sum of the samples (and they know there were five of them).

What are some examples of distributions where this isn't a fair game and the first player has an advantage? I guess the normal distribution isn't one of them since the sample mean is a sufficient statistic for the true mean.

Note: I asked a similar question here: Mean is not a sufficient statistic for the normal distribution when variance is not known? about the normal distribution and it was suggested I ask a new one for non-normal ones.


EDIT: Two answers with a uniform distribution. I would love to hear about more examples if people know of any.

Best Answer

For a uniform distribution between $0$ and $2 \mu$, the player who guesses the sample mean would do worse than one which guesses $\frac{3}{5} \max(x_i)$ (the sample maximum is a sufficient statistic for the mean of a uniform distribution lower bounded by 0).

In this particular case, it can be verified numerically. Without loss of generality, we set $\mu = 0.5$ in the simulation. It turns out that about 2/3rds of the time, the 3/5 max estimator does better.

Here is a Python simulation demonstrating this.

import numpy as np
Ntrials = 1000000
xs = np.random.random((5,Ntrials))
sample_mean_error = np.abs(xs.mean(axis=0)-0.5)
better_estimator_error = np.abs(0.6*xs.max(axis=0)-0.5)
print((sample_mean_error > better_estimator_error).sum())
Related Question