Mathematical Statistics – Understanding the Unbiased Variance Estimator for Finite Population

estimatorsfinite-populationmathematical-statisticssurvey-samplingunbiased-estimator

Going through Sharon L. Lohr's Sampling design book (2nd Edition), I have no issues with the content all the way until it goes into the proof in chapter 2 on SRSWOR that $E[s^2] = S^2$, where $S^2$ is defined as:
$$
S^2 = \frac{1}{N-1}\sum_{i=1}^{N} (y_i – \bar{y}_{U})^2
$$
And the sample variance estimator is:
$$
s^2 = \frac{1}{n-1}\sum_{i\in\mathcal{S}}(y_i-\bar{y})^2
$$
where $U$ is the index set of the finite population:
$$
U = \{1,2,\dotsc,N\}
$$
And $\mathcal{S}$ is the particular sample chosen, a subset consisting of $n$ of the units in $U$.

It says:

and then find the multiplicative constant that will give the unbiasedness:
$$
\begin{align}
E\left[\sum_{i\in\mathcal{S}}(y_i-\bar{y})^2\right] & = E\left[\sum_{i\in\mathcal{S}}((y_i-\bar{y}_U) – (\bar{y}-\bar{y}_U))^2\right]\\
& = E\left[\sum_{i\in\mathcal{S}}(y_i-\bar{y}_U)^2 – n(\bar{y}-\bar{y}_U)^2\right]\\
& = E\left[\sum_{i=1}^NZ_i(y_i-\bar{y}_U)^2\right] – n\textrm{Var}(\bar{y})\\
& = \frac{n}{N}\sum_{i=1}^N(y_i-\bar{y}_U)^2-\left(1-\frac{n}{N}\right)S^2\\
& = \frac{n(N-1)}{N}S^2 – \frac{N-n}{N}S^2\\
& = (n-1)S^2
\end{align}
$$
Thus,
$$
E\left[\frac{1}{n-1}\sum_{i\in\mathcal{S}}(y_i-\bar{y})^2\right] = E[s^2] = S^2
$$

I have no issues with most of the derivation, except for how the first line turns into the second line.
I assume there must be the intermediary step:
$$
\begin{align}
E\left[\sum_{i\in\mathcal{S}}(y_i-\bar{y})^2\right] & = E\left[\sum_{i\in\mathcal{S}}((y_i-\bar{y}_U) – (\bar{y}-\bar{y}_U))^2\right]\\
& = E\left[\sum_{i\in\mathcal{S}}(y_i-\bar{y}_U)^2 – \sum_{i\in\mathcal{S}}(\bar{y}-\bar{y}_U)^2\right]\\
& = E\left[\sum_{i\in\mathcal{S}}(y_i-\bar{y}_U)^2 – n(\bar{y}-\bar{y}_U)^2\right]\\
\end{align}
$$
But I still can't get from the first line to the second.
My guesses are that either I am getting confused about what the terms mean, the notation, the summation, or I have made an obvious mistake.

Either way, any help would be greatly appreciated.

Best Answer

Following the intermediary step:

\begin{align} \mathbb E\left[\sum_{i\in\mathcal{S}}(y_i-\bar{y})^2\right] & = \mathbb E\left[\sum_{i\in\mathcal{S}}((y_i-\bar{y}_U) - (\bar{y}-\bar{y}_U))^2\right]\\ &= \mathbb E\left[\sum_{i\in\mathcal{S}}(y_i-\bar{y}_U)^2 +n(\bar{y}-\bar{y}_U)^2-2(\bar{y}-\bar{y}_U)\underbrace{\sum_{i\in\mathcal{S}}(y_i-\bar{y}_U)}_{= n(\bar{y}-\bar{y}_U)}\right] . \end{align}

Best Answer

Related Solutions

Regression – How to Prove That $\hat{\sigma}^2$ is an Unbiased Estimator of $\sigma^2$ Using Standard Regression Assumptions

Mathematical Statistics – Unbiased Estimator of Variance for a Sample from a Finite Population

Related Question