# Solved – How to calculate sample size for stratified sampling

sampling

When determining the size of a sample for each strata how should this be calculated? Do we compute the sample size based on the total population and then stratify based on the percentage of each strata or do we consider each strata as a different population and calculate the sample size individually?

The comments reveal that you might have a different problem from what the question asks, but I'm going to answer the question you originally asked anyway in case someone else finds themself with the same query.

The sample size required for a design-based sampling scheme is a function of the size of the population, the variability of responses in the population, and the intended accuracy of the estimate. Stratification is meant to minimise sample size by grouping units into similar groups, decreasing the variability of responses within each group, and thus decreasing the required sample size across all groups.

Selection is then done independently within each stratum, and the estimates for each stratum are combined to come up with the estimate across the population. So effectively stratification involves designing a separate survey for each stratum - this means strata should be designed first. However, since allocation of sample between strata does have dependencies: you have a finite amount of money across all strata, and you have a target standard error for your combined estimate, etc; you need to optimise all sample sizes together.

The standard way to do this is to choose what you'd like to optimise (minimal sample size for a fixed standard error? minimal standard error for a fixed cost?), and then apply the optimal allocation formula (or other) across your strata. This formula requires setting a total sample size, however, if you have a target estimator variance, you can substitute the $n_h$ derived by the optimal formula into the variance formula for your estimator and find $n$ as a function of your target estimator variance. Then you can find all your $n_h$s.

After applying the optimal allocation if any samples are overallocated (i.e. you select more than the stratum population) or underallocated (i.e. you don't select enough units to be able to estimate variance within the stratum), you fix their sample sizes and re-run the allocation. Usually, you fix overallocated strata first, re-run allocation, then if any strata are still underallocated you bump up their sample size until it's good enough.

So, choose strata first, then find total sample size using the allocation formula and your constraints, then allocate this sample size and adjust.