Survey Sampling – How to Calculate Confidence Levels of a Stratified Sample with Missing Units

non-responsesamplingsurveysurvey-samplingsurvey-weights

I'm conducting a stratified survey among health institutions; 700 units were allocated into one of the 30 strata designed to reflect the proportionality of the population. However, it's been difficult to obtain even 450 questionnaires, even after having extending the period of the project, and that I need to end it ASAP.
A professor asked me whether the survey will remain valid, i.e., statistically valid if we end the survey with 450 interviews. The survey was planned to have 3% of margin of error with a 90% level of confidence, and $\hat{p}=0.5$ for unknown proportion split. The overall returning rate is 72%, with most of strata returning about 80%, but we've 6 strata with a returning rate just above 40%.
At this stage, we wonder what would be the confidence level for each stratum, is it possible to compute? Is there a procedure that we can use for assessing the confidence level of a stratified survey to help in the decision how far can we go with this? Also, is it right to reweight/calibrate the actual sample, so to reflect the data, or should we keep the full sample (700) with all the missing units (250)?

Best Answer

It looks like you've done all you could. The strata with the high non-response rates apparently do not constitute a large portion of the population. In retrospect, I would have suggested a smaller sample, with pilot tests, and more time devoted to follow-up of a random sample of non-responders.

Here are my thoughts on what you should do:

  1. Reweighted analysis

You should reweight to correct for non-response. Define

$$N_h = \text{number of institutions in stratum }h $$

and

$$ m_h = \text{number of responding institutions in stratum }h $$

Then you can remove non-response bias related to stratum membership by running a survey program with weight defined for each institution in stratum $h$ as

$$ w_h = \dfrac{N_h}{m_h}$$

This will not remove non-response biases related to other, within-stratum, factors, but it's better than doing nothing.

  1. Do a stratified analysis with a survey program

The original sample size calculation apparently was the one appropriate for a simple random sample, not for a stratified random sample; i.e. it assumed that the estimated proportion would be the overall sample proportion.

Instead, you should use a survey analysis program that accepts stratum and weight information. Stata and SAS contain such programs. They will compute a stratified proportion, with an estimated standard error that will be smaller than that of the ordinary sample proprtion. You won't know exactly what the bound on error will be until you do the calculation.

  1. You can estimate a confidence interval for every stratum, but be aware that the relevant sample size is the number of responding observations ($m_h$) in the stratum. The average of these is 450/30 = 15, so some intervals will be very wide.

You can, of course, consider subsets of the population, including groups of strata or subsets defined by characteristics measured during the survey. Such subpopulation standard errors require a special formula, but every package with survey capabilities (e.g. Stata, SAS, Survey Package in R) will use it.

Added To answer your question about the sample to keep. The analysis will be based on the 450 responding institutions in the sample, but you will need to add information about the numbers of institutions in the population. You should keep the 250 non-responding institutions in the data set. They won't affect the analysis because the values of all measured variables will be missing You can also set the weight variable to zero or missing), but you need them to make a table describing response rates by stratum.

Related Question