Solved – Understanding the proof of sample mean being unbiased estimator of population mean in Simple Random Sampling Without Replacement (SRSWOR)

sampling

I was reading about the proof of the sample mean being the unbiased estimator of population mean. Here is the concerned derivation:

Let us consider the simple arithmetic mean $\bar y = \frac{1}{n}\,\sum_{i=1}^{n} y_i$ as an unbiased estimator of population mean $\overline Y = \frac{1}{N}\,\sum_{i=1}^{N} Y_i$.

Simple Random Sampling Without Replacement

Let $t_i = \sum_{i=1}^n y_i\;.$

\begin{align}\mathrm E(\bar y)&= \frac{1}{n}\,\mathrm E{\left(\sum_{i=1}^n y_i\right)}\\&= \frac{1}{n}\,\mathrm E(t_i)\\ &= \frac{1}{n}\color{red}{\left(\frac{1}{N \choose n}\,\sum_{i=1}^{N \choose n}t_i\right)}\\ &= \frac{1}{n}\left(\frac{1}{N \choose n}\,\sum_{i=1}^{N \choose n}\left(\sum y_i\right)\right)\\ &= \frac{1}{n}\left(\frac{1}{N \choose n}\,\color{red}{{N-1\choose n-1}\sum_{i=1}^N y_i}\right)\\ &=\frac{1}{N}\sum_{i=1}^{N}y_i\\ &= \overline Y\;.\end{align}

I couldn't understand the derivation; I really couldn't conceive how the red coloured terms came from nowhere.

Can anyone please help me explain the derivation by showing how the red coloured terms came in the concerned steps?

Best Answer

The author's derivation is pretty strange. The fact that $\bar{y}$ is an unbiased estimate of $\mu$ the population mean when sampling without replacement is true due to linearity of expectation alone:

\begin{align} \text{E}(\bar{y}) &= \text{E} \left ( \frac{1}{n} \sum_{i=1}^{n} y_i \right ) \\ &= \frac{1}{n} \sum_{i=1}^{n} \text{E}(y_i) \\ &= \frac{1}{n} \sum_{i=1}^{n} \mu \\ &= \mu . \end{align}

There's no reason to bother with combinatorics.

Related Solutions

Solved – Why is the probability of drawing a certain element from a population of size $N$ be $1/N$ even though the sampling is done without replacement

By way of induction and the total law of probability it holds that:

$$P(X_2 = k) = \sum_{i=1}^n P(X_2 = k | X_1 = i) P(X_1 = i)$$

and supposing it holds for any $j$ that $P(X_j = k) = 1/N$ then

$$P(X_{j+1} = k) = \sum_{i=1} ^ N P(X_{j+1} = k | X_j = i) P(X_j = i)$$.

I think the comments I've contributed on the post contribute more of the intuition behind this finding.

Mathematical Statistics – How to Find the Covariance of Two Random Sums

To help make the ideas clear, I will use capital letters for random variables.

Everything follows from the restriction $\sum W_i=n,$ because that implies this sum has zero variance. Since each $W_i$ is a Bernoulli variable,

$$\operatorname{Var}(W_i) = \frac{n(N-n)}{N^2}.$$

Computing the variance of the sum and assuming, as is the case with simple random sampling, that for $i\ne j$ $\operatorname{Cov}(W_i,W_j)$ does not depend on $i$ or $j,$ we find

$$\begin{aligned} 0 &= \operatorname{Var}\left(\sum_{i=1}^N W_i\right) \\ &=\sum_{i=1}^N \operatorname{Var}\left(W_i\right) + \sum_{i\ne j}^N \operatorname{Cov}(W_i,W_j) \\ &= N\frac{n(N-n)}{N^2} + N(N-1)\operatorname{Cov}\left(W_1,W_2\right), \end{aligned}$$

enabling us to solve for the covariance as

$$\operatorname{Cov}\left(W_i,W_j\right) = \operatorname{Cov}\left(W_1,W_2\right) = -\frac{n(N-n)}{N^2(N-1)}.$$

Consequently, assuming $N\gt 1,$ for fixed coefficients $(x_i)$ and $(y_i)$ and writing $\bar x = \sum x_i/N,$ $\bar y = \sum y_i/N,$ and $\overline{xy}=\sum_{i}x_iy_i/N,$ we find

$$\begin{aligned} \operatorname{Cov}\left(\sum_{i=1}^N x_iW_i, \sum_{j=1}^N y_jW_j\right) &= \sum_{i=1}^N x_iy_i \operatorname{Var}\left(W_i\right) + \sum_{i\ne j}^N x_iy_j\operatorname{Cov}\left(W_i,W_j\right) \\ &= \frac{n(N-n)}{N^2}\sum_{i=1}^Nx_iy_i - \frac{n(N-n)}{N^2(N-1)}\sum_{i\ne j}^N x_iy_j \\ &= \frac{n(N-n)}{N}\overline{xy} - \frac{n(N-n)}{N-1} \bar{x}\bar{y} + \frac{n(N-n)}{N(N-1)}\overline{xy}\\ &= \frac{n(N-n)}{N-1}\left(\overline{xy} - \bar{x}\bar{y} \right). \end{aligned}$$

(When $N=1$ the double sum doesn't appear and the result easily reduces to $0.$)

If we draw one of the $(x_i,y_i)$ randomly and equiprobably from all $N$ of these paired values, we have a bivariate random variable $(X,Y),$ enabling the result to be written

$$\operatorname{Cov}\left(\sum_{i=1}^N x_iW_i, \sum_{j=1}^N y_jW_j\right) = \frac{n(N-n)}{N-1} \operatorname{Cov}(X,Y).$$

I was tempted to check this result with simulation, but elected to use an exhaustive enumeration of all the possible samples instead, of which there are $\binom{N}{n}.$ For small $N$ this is feasible and gives precise results. The output computes the covariance of the weighted sums in three ways: using the formula in terms of $\overline{xy}-\bar{x}\bar{y},$ the formula in terms of $\operatorname{Cov}(X,Y),$ and--this is the verification--the population covariance of all possible sample sums.

An example of its output for $N=20,$ $n=15$ is

Direct Formula Covariance formula         Exhaustive 
     -3.035239          -3.035239          -3.035239

demonstrating agreement in this case.

Here's the R code.

#
# Create *any* bivariate population you like.
#
N <- 20
# set.seed(17)
x <- rnorm(N)
y <- rnorm(N) - x
#
# Specify the sample size.
#
n <- 15
if(choose(N, n) > 1e6) stop("Are you sure you want to do this?", call.=FALSE)
# 
# Compute the distribution of the sample sum.
#
W <- combn(1:N, n)
wx <- apply(W, 2, function(w) sum(x[w]))
wy <- apply(W, 2, function(w) sum(y[w]))
#
# Compare various formulae.
#
c(`Direct Formula` = n * (N-n) / (N-1) * (mean(x*y) - mean(x)*mean(y)),
`Covariance formula`=n * (N-n) / N * cov(x, y),
Exhaustive = cov(wx, wy)*(1 - 1/length(wx)))

# plot(wx, wy) # Can be interesting...
```

Simple Random Sampling Without Replacement

Best Answer

Related Solutions

Solved – Why is the probability of drawing a certain element from a population of size $N$ be $1/N$ even though the sampling is done without replacement

Mathematical Statistics – How to Find the Covariance of Two Random Sums

Related Question