Confidence Interval – Computing Standard Errors Using Stratified Sampling: A Step-by-Step Process

confidence intervalstandard errorstratificationsurvey-weights

I am trying to understand the following formula for the standard error of the population mean as estimated through stratified sampling. On the CRAN site, the formula given is $$
S_{\bar{x}_{\textit{str}}} = \sqrt{
\sum_h
\left(1 – \frac{n_h}{N_h}\right)
\left( \frac{N_h}{N} \right)^2
\left( \frac{S_h^2}{n_h} \right)
}
$$

where

  • $N$ is the total population size
  • $N_h$ is the number of units (in the population) that belong to stratum $h$
  • $n_h$ is the number of units sampled that belong to stratum $h$
  • $S_h^2$ is the sample variance for the sampled units that belong to stratum $h$.

I am puzzled by the factor $\left(1 – \frac{n_h}{N_h}\right)\left(\frac{N_h}{N}\right)$ as I would have expected the formula to be the (square root of the) strata-weighted sum of (squared) standard standard errors:
$$
S_{\bar{x}_{\textit{wrong}}} = \sqrt{
\sum_h
\left( \frac{N_h}{N} \right)
\left( \frac{S_h^2}{n_h} \right).
}
$$

How does $\left(1 – \frac{n_h}{N_h}\right)\left(\frac{N_h}{N}\right)$ enter the picture?

Best Answer

I'm assuming you are familiar with the basic theory/mathematics underlying the formulas to obtain the variance when working with simple random samples with a finite population. When working with a finite population, then under simple random sampling (without replacement), the variance of the mean is given by:

$V(\bar{Y}_{SRS})=\frac{1-f}{n}S^{2}$

Under stratified sampling the mean is given by: $\sum_{h=1}^{L}\frac{N_{h}}{N}\bar{Y}_{h}$, which is just a weighted sum of the individual stratum means. So we can compute the variance of the stratified mean as follows, since, under stratified random sampling, sampling is performed independently in each stratum and therefore the variance of the sum of the stratum means is the sum of the variances of the stratum means (i.e. the covariance term vanishes):

\begin{eqnarray*} V\left[\sum_{h=1}^{L}\frac{N_{h}}{N}\bar{Y}_{h}\right] & = & \sum_{h=1}^{L}V\left(\frac{N_{h}}{N}\bar{Y}_{h}\right)\\ & = & \sum_{h=1}^{L}\left(\frac{N_{h}}{N}\right)^{2}V\left(\bar{Y}_{h}\right)\\ & = & \sum_{h=1}^{L}\left(\frac{N_{h}}{N}\right)^{2}\frac{1-f_{h}}{n_{h}}S_{h}^{2}\\ & = & \sum_{h=1}^{L}\left(\frac{N_{h}}{N}\right)^{2}\left(1-f_{h}\right)\frac{S_{h}^{2}}{n_{h}}\\ & = & \sum_{h=1}^{L}\left(\frac{N_{h}}{N}\right)^{2}\left(1-\frac{n_{h}}{N_{h}}\right)\frac{S_{h}^{2}}{n_{h}}\\ & = & \sum_{h=1}^{L}\left(1-\frac{n_{h}}{N_{h}}\right)\left(\frac{N_{h}}{N}\right)^{2}\frac{S_{h}^{2}}{n_{h}}\blacksquare \end{eqnarray*}

Related Question