Solved – Possible catch 22 with Neyman’s optimal allocation

sample-sizesamplingstratification

I am using stratified sampling and neyman's optimal allocation to compute the best sample size for each stratum. neyman's optimal allocation is given by the formula,

$$n_h = n \frac{N_h * S_h}{\sum_i N_i * S_i}$$

where $$\sum_h n_h = n$$ $$\sum_i N_i = N$$

and $n$ is the total sample size, $n_h$ is the sample size for stratum h, $S_h$ is the standard deviation for stratum h, $N_h$ is the population size for stratum h, and $N$ is the total population size.

My question/concern is, "isn't there a catch 22 here? i need $S_h$ to compute $n_h$, but to get $S_h$, i would have already done a previous sampling, thereby determined a previous $n_h$, so that i could estimate $S_h$."

Anybody out there ever done this type of sample size estimation please shed light on what is done in the real world.

Best Answer

Neyman Allocation (or modifications of NA) is often used in practice. Yes, you are right, we never know $S_h$ when doing the calculation of sample allocation. But we can estimate $S_h$ or use some approximation of $S_h$.

Assume $S_h(y)$ is computed for a variable $\textbf{y}$. $S_h$ can be estimated from the previous survey, if $\textbf{y}$ was observed in the previous survey.

There could be another variable $\textbf{z}$ which is correlated with $\textbf{y}$. $\textbf{z}$ could be available from other survey or some auxiliary data source (register, census). Then you can use $S_h(z)$ or $s_h(z)$ (an estimate) as approximate for $S_h(y)$.

It could be possible to guess $S_h(y)$, for example in case of binary $\textbf{y}$.

Keep in mind - you will never achieve optimal allocation in practice, so allocation close to the optimal could be good enough.

Related Question