[Math] Why do we assume that sample means of a population equal to the mean of the population

statistical-inferencestatistics

Why?

While calculating mean of the sampling distribution we end up with the mean of each identical sample simply $\mu$

But how come in the first place – mean of the each sample – is equal to $\mu$ the population mean? A sample derived from the population has a very small chance to be equal to the mean of the population, take sample size to be 1 for instance.

What makes us make this assumption? When trying to estimate population parameters we usually say mean of the sampling distribution is a good estimator since it's expected value is equal to the mean of the population itself. It's equal to the population itself because mean of the sampling distribution is defined as following

$Y=\frac{1}{n} \sum Y_i$ where $Y_i$'s are the means associated with each sample and $n$ is the number of samples we've obtained.

Best Answer

We do not assume this, because it's not true. What is true is that as the sample gets larger, the sample mean gets closer to the population mean. A simple way to justify this is as follows. If we have

$$Y = \frac{1}{n} \sum_{i=1}^n Y_i$$

where the $Y_i$ are iid samples from some distribution, then we can compute not only that the expected value $\mathbb{E}(Y) = \mathbb{E}(Y_i) = \mu$ matches up, but also that

$$\text{Var}(Y) = \frac{1}{n} \text{Var}(Y_i) = \frac{\sigma^2}{n}$$

so that the variance of $Y$ decreases as $n$ gets large; here $\sigma$ is the population standard deviation. Chebyshev's inequality now implies that

$$\mathbb{P}(|Y - \mu| \ge r \sigma) \le \frac{1}{r^2 n}.$$

Hence, for example, setting $r = 1$ gives that the probability that $Y$ is outside $[\mu - \sigma, \mu + \sigma]$ (is not within one standard deviation of the population mean) is at most $\frac{1}{n}$. So, for example, if we wanted to get within one standard deviation with probability at least 95%, it would suffice to use a sample of $n = 20$. Note that this bound does not depend on the size of the population, which we're effectively assuming is infinite.

Actually we expect much better decay than this, which can be made precise using e.g. Chernoff bounds, although the central limit theorem is maybe a more intuitive way to think about these things.