When determining the size of a sample for each strata how should this be calculated? Do we compute the sample size based on the total population and then stratify based on the percentage of each strata or do we consider each strata as a different population and calculate the sample size individually?

# Solved – How to calculate sample size for stratified sampling

sampling

#### Related Solutions

Minimum sample size is useful once you consider that units might not respond. However, if you make it too large your design is going to be less efficient. If you aim for about $5$ reponding units should be enough. Bump this up with an assumed response rate $5/rr $. $30$ is more than enough - but will probably mean you are putting sample where it isn't needed for a good estimate of the total

If you really *have to* divide the population into 5 strata, you need to make those strata mutually exclusive. You can achieve that by assigning the visitors to one department only, and that department can be the one they visited the most. Let's look at the following fake data set where we are asked to separate the population of 4 visitors who visited 2 departments into 2 strata, while the departments are not non-overlapping in terms of their visitors.

```
visitor | department | visitedTime
----------------------------------
1 | a | 5
1 | b | 4
2 | a | 0
2 | b | 2
3 | a | 9
3 | b | 1
4 | a | 0
4 | b | 5
```

We see some visitors visited only one department (2 and 4), and the other two visited both departments, causing the stratification to fail. We can collapse this data set to the most visited department per visitor, and get

```
visitor | mostVisited
---------------------
1 | a
2 | b
3 | a
4 | b
```

Departments are non-overlapping in the second table. From and interpretation stand point, I think it makes sense to ask the visitor about the department they visit the most. Therefore this strategy is perfectly rationalizable.

Now, you may have a highly unbalanced picture such that one department is always/never the most visited. Think about the reception, everybody has to visit it once per visit, causing that the most visited department. If you can omit such a department, you should, to make your life easy. If you can't, you can keep this department outside of stratification, i.e. sample a certain amount of visitors from it first, and then apply the "collapsed stratification" I just discussed to the remaining departments.

## Best Answer

The comments reveal that you might have a different problem from what the question asks, but I'm going to answer the question you originally asked anyway in case someone else finds themself with the same query.

The sample size required for a design-based sampling scheme is a function of the size of the population, the variability of responses in the population, and the intended accuracy of the estimate. Stratification is meant to minimise sample size by grouping units into similar groups, decreasing the variability of responses within each group, and thus decreasing the required sample size across all groups.

Selection is then done independently within each stratum, and the estimates for each stratum are combined to come up with the estimate across the population. So effectively stratification involves designing a separate survey for each stratum - this means strata should be designed first. However, since allocation of sample between strata does have dependencies: you have a finite amount of money across all strata, and you have a target standard error for your combined estimate, etc; you need to optimise all sample sizes together.

The standard way to do this is to choose what you'd like to optimise (minimal sample size for a fixed standard error? minimal standard error for a fixed cost?), and then apply the optimal allocation formula (or other) across your strata. This formula requires setting a total sample size, however, if you have a target estimator variance, you can substitute the $n_h$ derived by the optimal formula into the variance formula for your estimator and find $n$ as a function of your target estimator variance. Then you can find all your $n_h$s.

After applying the optimal allocation if any samples are overallocated (i.e. you select more than the stratum population) or underallocated (i.e. you don't select enough units to be able to estimate variance within the stratum), you fix their sample sizes and re-run the allocation. Usually, you fix overallocated strata first, re-run allocation, then if any strata are still underallocated you bump up their sample size until it's good enough.

So, choose strata first, then find total sample size using the allocation formula and your constraints, then allocate this sample size and adjust.