Solved – Addressing Non-response in a Convenience Sample

bootstrapmodelingresamplingsamplingsurvey

I am studying customer satisfaction in a large hierarchical organization. I plan to administer a voluntary survey to customers across the organization, and need to address non-response in my analysis.

I know:

  • How many customers there are (population size)
  • Some demographic/market segment data for all customers
  • The products/services they used
  • The account manager and department that handles their account
  • 50% response rate expected

I want to draw conclusions about departments, account managers and products. It'd also be nice to be able to break things out by customer demographics, but that's secondary.

The naive approach would assume that non-response is random and treat the data as a simple random sample of the population. My concern is that non-response is probably not completely random.

So here are my questions:

  • Is the 50% response rate sufficiently high that the naive approach isn't terrible?
  • If not, is resampling or post-stratification a sufficient remedy?
  • If not, does it make sense to try to model the sampling probability mechanism using the known customer demographic information?
  • If none of these are sufficient, would I be better off designing a smaller stratified sample and aggressively pursuing non-responders?

Related:

EDIT 4/22/2014:
Reviewing @Steve Samuels response and additional independent research, I think what I'm dealing with is a census with non-response. The population (all current customers) is sufficiently small and well-known to serve as the sample frame. Thus by definition we have a census. It's actually less feasible to address a random sub-population than the whole population due to the platform we're using to deliver the questionnaire.

My plan moving forward is to execute the census, then study the difference between the responding & non-responding populations in a variety of ways. Moving forward, follow-up efforts will be adjusted to address specific problems found in the non-response analysis.

Some good resources for this problem include:

Best Answer

I think 50% response is acceptable only if your supervisor really doesn't care what the numbers are. To the ordinary bounds of error you would need to add plausible bounds related to response bias. The extreme tabulation is one mandated for certain satisfaction surveys conducted by managed health care plans (related by a friend in that business): every non-respondent is assigned to a "dissatisfied" category.

So my first suggestion is to devote efforts to bring the response rate up. I would do a series of pilot experiments to see how much different combinations of approaches, incentives, and questions will improve the response rate. If this is a survey that will be repeated, the experiments could yield continual improvement over time. Continuing surveys throughout the year will also provide more timely data than a once-a-year survey. Because the sample size is smaller at any one time, quality will be higher.

You say you need to do a "convenience" sample, but you then allude to the possibility of a stratified sample; without random sampling, this is quota sampling. I urge you to experiment with a random sampling approach.. Don't try to aggressively follow-up all non-responders; it would take too much effort (50%!). Select a random sub-sample; after a 1 in k sub-sample, you can multiply the sub-sample respondents by k.

About the analytic remedies

I'm not sure what you mean by re-sampling as a remedy for non-response. That topic is not mentioned on the page you link to.

I take it that by post-stratification, you mean any technique that re-weights the data so that sample estimates of multiple characteristics closely resemble known population quantities. Three standard techniques are survey raking via iterative proportional fitting (IPF), calibration, and generalized regression (GREG). See Little (2007) and Sarndal (2007). They will not touch that part of non-response related only to characteristics known only for the selected sample. For those factors, if any, you can model the probability of response and then apply inverse probability weighting.

The major problem with post-stratification techniques is that good matching of sample totals to the population totals can result in more bias for subgroups whose definitions were not part of the post-stratification. So include managers, departments, and products, if possible, among the post-stratification variables.

References

Little, RJA. 2007. Should we use the survey weights to weight? JPSM Distinguished Lecture http://www.jpsm.umd.edu/jpsm/archived/specialevents/little_lecture/weights407.pdf

Sarndal, C.E. 2007. The calibration approach in survey theory and practice. Survey Methodology 33, no. 2: 99-119. http://www.statcan.gc.ca/pub/12-001-x/2007002/article/10488-eng.pdf.

Related Question