Solved – Does SurveyMonkey ignore the fact that you get a non-random sample

confidence intervalsamplesample-sizesurvey

SurveyMonkey has steps and a chart for you to figure out what sample size you need for a given margin of error or confidence interval, based on your population size.

SurveyMonkey sample size

Does this chart simply ignore the fact that you will not get a random sample, since you only get the people who bother to respond to the survey?

I'm getting warned as I type this that the question appears subjective so maybe I'm not asking it correctly. It's not really about SurveyMonkey but is a more general question — can you actually calculate confidence intervals from voluntary response data using some advanced techniques that I don't know about?

In exit polls or national surveys, obviously they must deal with this problem. My education didn't cover survey sampling techniques in depth but I assume it involves collecting demographic data and using that to know how representative of a sample you have.

But aside from that, for a simple online survey, are they just assuming that the people who bother to respond are a random sample of the population?

Best Answer

The short answer is yes: Survey Monkey ignores exactly how you obtained your sample. Survey Monkey is not smart enough to assume that what you have gathered isn't a convenience sample, but virtually every Survey Monkey survey is a convenience sample. This creates massive discrepancy in exactly what you're estimating which no amount of sheer sampling can/will eliminate. On one hand you could define a population (and associations therein) you would obtain from a SRS. On the other, you could define a population defined by your non-random sampling, the associations there you can estimate (and the power rules hold for such values). It's up to you as a researcher to discuss the discrepancy and let the reader decide exactly how valid the non-random sample could be in approximating a real trend.

As a point, there are inconsistent uses of the term bias. In probability theory, the bias of an estimator is defined by $\mbox{Bias}_n = \theta - \hat{\theta}_n$. However an estimator can be biased, but consistent, so that bias "vanishes" in large samples, such as the bias of maximum likelihood estimates of the standard deviation of normally distributed RVs. i.e. $\hat{\theta} \rightarrow_p \theta$. Estimators which don't have vanishing bias, (e.g. $\hat{\theta} \not\to_p \theta$) are called inconsistent in probability theory. Study design experts (like epidemiologists) have picked up a bad habit of calling inconsistency "bias". In this case, it's selection bias or volunteer bias. It's certainly a form of bias, but inconsistency implies that no amount of sampling will ever correct the issue.

In order to estimate population level associations from convenience sample data, you would have to correctly identify the sampling probability mechanism and use inverse probability weighting in all of your estimates. In very rare situations does this make sense. Identifying such a mechanism is next to impossible in practice. A time that it can be done is in a cohort of individuals with previous information who are approached to fill out a survey. Nonresponse probability can be estimated as a function of that previous information, e.g. age, sex, SES, ... Weighting gives you a chance to extrapolate what results would have been in the non-responder population. Census is a good example of the involvement of inverse probability weighting for such analyses.

Related Question