Solved – Stratified sampling question


Suppose that I am conducting a questionnaire study that is trying to measure level of awareness of subjects about a programming language and find the relation of those level of awareness to working conditions and methods etc.

To improve my precision I decided to go with stratified sampling. If I have 1 criterion for stratification such as geographical distribution (to make sure I don't over-represent subjects from areas that have less programmers), then I end up with 6 distinct strata (country provinces).

I know how to go about analysing these to find margin of error, standard error etc but I realised that is not good enough and I need to introduce more criteria for stratification, such as level of education (so i don't over-represent a group who are not very present between programmers based on their education), level of seniority etc.

I have the proportion (in %) for all these criteria but I don't know how I go about sampling when I have more than one criterion?

Best Answer

There isn't going to be one best answer for this kind of sampling. It depends on the observable covariates in your sampling frame, the variables you expect to be important determinants of survey response, and the analysis you want to run once the survey is complete.

With that said, there are a couple of general principles that can help guide your sampling strategy.

For descriptive surveys, you generally want your sample to closely resemble the population of interest in as many ways as possible. This will help keep your weights even, in order to maximize your effective sample size.

If you intend to do multivariate analysis, you may want to stratify on important variables of interest. This will increase variance in your IVs and DVs, and can help increase your statistical power in later analysis. This is why some studies conduct oversamples of minority populations -- because race and ethnicity are important IVs in many analyses. Case-control studies follow a similar logic for stratifying on the dependent variable.

If you intend to do description and analysis, then these are goals will be partly at odds. No matter what, you need to follow the basic principle of sampling and make sure that every individual in the population has a known, non-zero chance of being selected into the sample. Advanced topics worth looking up in this area include propensity scores, and sample weighting via raking.

Closing thought: these are general principles for sampling design and sample weighting. You don't say much about your specific application, but I'm guessing that most of this is overkill. The main reason to stratify a sample is if you have reason to believe a simple random sample will miss out on some important group of interest (geographic, demographic, or otherwise). That is, the sampled population would be too small for useful analysis. If you don't have that problem, then you don't need to worry too much about stratification.

Related Question