Statistical Sampling – How Statistical Sampling Works for Political Surveys Like Gallup

sample-sizesampling

Polls out there (say, Gallup) sample some absurdly low number of people compared to the population size (e.g. maybe a thousand people out of hundreds of millions).

Now, to me, sampling a population as a means for estimating the population's statistics makes sense when you have a strong reason to believe the samples are representative of the population (or, similarly, of other samples).

For example, sampling obviously makes sense for medical studies, because we know a priori that humans all have quite similar genomes and that this factor makes their bodies behave similarly.
Note that this isn't some kind of loose coupling — genome is a pretty damn strong determining factor.

However, I just don't understand what justifies using low sample sizes for things like political polls.

I could buy that maybe 80-90% of the people in any given neighborhood vote similarly for the president (due to similar socioeconomic/education backgrounds), but this hardly seems to justify the absurdly low number of samples. There is literally no compelling reason (at least to me) why 1000 random voters should behave like the 200 million other voters.

To me, you'd need at least like (say) 100× that amount. Why? I can think of a bunch of reasons, e.g.:

  1. There are ~22,000 precincts just in California. People grow up so differently in their economic and educational backgrounds that a poll of size 1000 seems laughably small. How can you summarize entire precincts with < 1 person on average?

  2. People can't generally change their bodies' responses to medicine, but they can change their opinions about politics just by thinking about it. The way I see it, there's no forcing factor akin to DNA in medicine when you're dealing with politics. At best I'd imagine there should be small pockets of correlation.

Yet somehow, polls like this seem to… work anyway? Or at least people seem to think they do?
But why should they? Maybe I just fundamentally don't understand sampling? Can someone explain?
I just can't take any of the polls I see seriously, but I feel like I'm in more or less alone in this…

Best Answer

It seems like you're imagining a very simple sampling model.

The simplest model for sampling is called aptly Simple Random Sampling. You select a subset of the population (e.g., by dialing phone numbers at random) and ask whomever answers how they're voting. If 487 say Clinton, 463 say Trump, and the remainder give you some wacky answer, then the polling firm would report that 49% of voters prefer Clinton, while 46% prefer Trump. However, the polling firms do a lot more than this. A simple random sample gives equal weight to every data point. However, suppose your sample contains--by chance--600 men and 400 women, which clearly isn't representative of the population as a whole. If men as a group lean one way, while women lean the other, this will bias your result. However, since we have pretty good demographic statistics, you can weight* the responses by counting the women's responses a bit more and the men's a bit less, so that the weighted response represents the population better. Polling organizations have more complicated weighing models that can make a non-representative sample resemble a more representative one.

The idea of weighting the sampled responses is on pretty firm statistical ground, but there is some flexibility in choosing what factors contribute to the weights. Most pollsters do reweight based on demographic factors like gender, age, and race. Given this, you might think that party identification (Democratic, Republican, etc) should also be included, but it turns out that most polling firms do not use it in their weights: party (self)-identification is tangled up with the voter's choice in a way that makes it less useful.

Many polling outfits also report their results among "likely voters". In these, respondents are either selected or weighted based on the likelihood that they'll actually turn up to the polls. This model is undoubtedly data-driven too, but the precise choice of factors allows for some flexibility. For example, including interactions between the candidate and voter's race (or gender) wasn't even sensible until 2008 or 2016, but I suspect they have some predictive power now.

In theory, you could include all sorts of things as weighting factors: musical preference, eye color, etc. However, demographic factors are popular choices for weighting factors because:

  • Empirically, they correlate well with voter behavior. Obviously, there is no iron-clad law that 'forces' white men to be lean Republican, but over the last fifty years, they have tended to.
    • The population values are well known (e.g., from the census or Vital Records)

However, pollsters also see the same news everyone else does, and can adjust the weighting variables if necessary.

There are also some "fudge factors" that are sometimes invoked to explain poll results. For example, respondents sometimes are reluctant to give "socially-undesirable" answers. The Bradley Effect posits that white voters sometimes downplay their support for white candidates running against a minority to avoid appearing racist. It is named after Tom Bradley, an African-American gubernatorial candidate who narrowly lost the election despite leading comfortably in the polls.

Finally, you're completely correct that the very act of asking someone's opinion can change it. Polling firms try to write their questions in a neutral way. To avoid issues with the order of possible responses, the candidates' names might be listed in random order. Multiple versions of a question are also sometimes tested against each other. This effect can also be exploited for nefarious ends in a push poll, where the interviewer isn't actually interested in collecting responses but in influencing them. For example, a push poll might ask "Would you vote for [Candidate A] even if it was reported that he was a child molester?".


* You might also set explicit targets for your sample, like including 500 men and 500 women. This is called stratified sampling--the population is stratified into different groups, and each group is then sampled random. In practice, this isn't done very often for polls, because you'd need to stratify into a lot of exhaustive groups (e.g., college-educated men between 18-24 in Urban Texas).

Related Question