[Math] How Sample mean equals Population Mean

central limit theoremprobabilityprobability distributionsprobability theorysampling

I did try searching here before posting, but could not find a satisfactory answer. This is my attempt to prove mean of sample means equals to population mean where I am stuck.

Note: Rewritten as per Aaron's feedback (not fully though yet, due to a pending question described below)

Step 1:

Population Mean

Let $Y$ denote a random variable from a population of size $T$ with mean $\mu$ and $\sigma$

So,

$
\text{Population Mean,} \ \ \
\mu = E[Y] = \dfrac{\sum\limits_{i=1}^T y_i}{T} \tag{1}
$

Step 2:

Sample Mean

Let $\widehat{Y_k}$ denote a sample set of size $n$. Let no of experiements = $N$
For any $k$th sample set $\widehat{Y_k}$ and, $k = 1,2,3,….N$

$
\text{Mean of sample set, } \ \
\overline{\widehat{Y_k}} = \dfrac {\sum\limits_{j=1}^n \widehat{y_{kj}}}{n} \tag{2}
$
$\overline{\widehat{Y_k}}$ is a random variable, not constant

Step 3:

Mean of Sample Means

Suppose we take the sample means, N number of times. In resulting sampling distribution, the mean of sample means can be calculated as follows

$
\text {Mean of Sample Means, } \ \ \
\mu_{\overline{\widehat{Y_k}}}
= E\Bigg[\overline{\widehat{Y}}\Bigg]
= \overline{\overline{\widehat{Y}}}
= \sum\limits_{k=1}^N
\dfrac{1}{N}\overline{\widehat{Y_k}} \\
$

Using eq. $2$
$
E\Bigg[\overline{\widehat{Y}}\Bigg]
= \sum\limits_{k=1}^N
\dfrac{1}{N}
\sum\limits_{j=1}^n \dfrac{\widehat{y_{kj}}}{n} \tag{3}
$

Main question:

How do I prove further equation $3$ equals population mean $\mu$? This is where I got stuck.
How do we derive similary for variance of sampling distribution of sample means, $\sigma_\overline{\widehat{Y_k}}^2 = \dfrac {\sigma^2}{n}$?

Aaron's Lemma:

Aaron Montgomery has done a great job trying to explain the issue in my semantics in his answers below, however there are few areas I am still missing to understand.
$$
\mathbb E \left[ \overline{ \widehat{Y_k} } \right] = \sum_{i \in \bigstar} \frac{1}{\binom T n} \sum_{j=1}^n \frac{\widehat{y_{ij}}}{n}
\tag {4}
$$
where $\bigstar$ is taken over all possible samples of size $n$ that can be drawn from the population.
Eq $4$ expands further as,

$$
\frac{1}{\binom T n \cdot n} \sum_{i=1}^T \binom{T-1}{n-1} y_i = \frac{\frac{(T-1)!}{(n-1)!(T-n)!}}{\frac{T!}{n!(T-n)!} n} \sum_{i=1}^T y_i = \frac 1 T \sum_{i=1}^T y_i = \mu. \qquad \qquad \tag {5}
$$

My questions in his lemma:

How do I equate equations $3$ to $4$?
Equation $5$ evolves because Aaron says "In the double summation, y will appear in exactly $\binom{T-1}{n-1}$ of the $\binom{T}{n}$ terms". How?

Best Answer

One important distinction is between the expected value, which represents a theoretical long-run average of sorts, and the observed sample average, which is the sum of your observed sample elements divided by the size of the sample. This distinction is key. The sample average is indeed a random variable, because it depends upon the sample that was collected. But the expected value is not a random variable; it's a constant. The symbols $E$ and $\mu$ should only be used when discussing the theoretical, long-run averages -- not the observed sample results. The following equalities are both fine: $$\mu_{\overline{\widehat{Y_k}}} = \mathbb E \left[\overline{\widehat{Y_k}} \right]$$ $$\overline{\widehat{Y_k}} = \frac{\sum_{j=1}^n \widehat{y_{kj}}}{n}$$ but the two quantities are not equal to one another. I think this disconnect lies at the heart of the issue in your post. In particular, it is why (2) is incorrect, as is the last line of (3).

Next: I would encourage you to revisit the link you posted, armed with some additional knowledge about the notation. Traditionally, in probability and statistics we use capital letters for random variables (such as the individual items selected in a sample), Greek letters for parameters, and Roman letters for other constants (such as a sample size). There are no rules about this, of course, but these are good conventions to adopt if for no other reason than that they will help you read existing literature on the subject. In the article you linked, $X_1$ is in fact not a constant; it is a random variable, and its mean is essentially defined to be $\mu$. Notice the remark at the top of the page; each $X_i$ is an individual observation drawn from an underlying population with mean $\mu$ and variance $\sigma^2$.

Now, I'll try to repair the calculations you listed in your question by sticking to the ideas you've presented as closely as possible. I'll also use your notation for the sake of this answer, but I should again stress that it's worth trying to adopt the standard conventions instead. We first prove a lemma:

Lemma 1: For each $k$, $\mathbb E \left[ \overline{\widehat{Y_k}} \right] = \mu$.

Proof: To compute this expected value, we average over all the possibilities: $$\mathbb E \left[ \overline{ \widehat{Y_k} } \right] = \sum_{i \in \bigstar} \frac{1}{\binom T n} \sum_{j=1}^n \frac{\widehat{y_{ij}}}{n}$$ where $\bigstar$ is taken over all possible samples of size $n$ that can be drawn from the population. Since this is a random sample, we note that each sample of size $n$ is equally likely, which is why we weight each one with a factor of $1 / \binom T n$.

Expanding that idea further: The fundamental question is how many ways there are to draw a sample of size $n$ from a population of size $T$ that contain the distinguished element $y$. This, like all probability, comes down to a counting argument. If we demand that a sample must contain $y$, then we can fill the other $n-1$ spots with any of the available $T-1$ choices. Hence, there are $\binom{T-1}{n-1}$ ways to accomplish this.

Fix any $y$ in the original population. In the double summation, $y$ will appear in exactly $\binom{T-1}{n-1}$ of the $\binom T n$ terms.

Hence, the expectation is equal to $$\frac{1}{\binom T n \cdot n} \sum_{i=1}^T \binom{T-1}{n-1} y_i = \frac{\frac{(T-1)!}{(n-1)!(T-n)!}}{\frac{T!}{n!(T-n)!} n} \sum_{i=1}^T y_i = \frac 1 T \sum_{i=1}^T y_i = \mu. \qquad \qquad \square$$

Important remark: This is not the best way to prove this! It is complicated, messy, and convoluted. The link you attached has the best way to prove this result.

Using that lemma, I think you can repair (3). I think you may have neglected a $\frac 1 N$ somewhere in (3), but it's hard to tell because I can't interpret the meaning of $\mu_{\overline { \widehat Y}}$. Perhaps Lemma 1 above was what you were trying to show there?

For main question 2: Since $\overline{\widehat{Y_k}} = \frac 1 n \sum_{j=1}^{n} \widehat{y_{kj}}$ and the $y_{kj}$ terms are independent of one another, it follows from a classical theorem I referenced in my other post that $$\operatorname{Var} \left(\overline{\widehat{Y_k}} \right) = \left( \frac 1 n \right)^2 \sum_{j=1}^n \operatorname{Var}(\widehat{y_{kj}}) = \frac 1 n \cdot \sigma^2.$$ Taking square roots of both sides gives the desired result.