Solved – Difference of Two Proportions Hypothesis Test with Weighted Sample Data

hypothesis testingsurveysurvey-weights

I have survey data which contain respondents' answers to several questions. As the survey contained a disproportionate number of people from certain demographic groups, the survey results are weighted by race, sex and age.

I have responses to the same questions for two years (eg. 2016 and 2017), and am trying to find out if the proportion of people who responded "yes" to a particular question has fallen. That is, I have calculated the weighted agreement rate to the question in 2016 (p1) and the weighted agreement rate to the question in 2017 (p2), and am trying to see if p1 – p2 = 0.

The simple hypothesis test for a difference between two proportions is well known (described at https://onlinecourses.science.psu.edu/stat414/node/268). However, I am wondering if:

1) Weighting the samples by demographic variables has changed the standard error; thus, a more complicated hypothesis test formula is required. If so, what is this formula?

2) Whether there are other methods, other than applying this possibly more complicated formula, to rigorously test for a difference between the two proportions. For instance, are there non parametric hypothesis tests that can be used?

Thank you for your time.

Best Answer

Your tests must account for complex survey design, and should include strata, clusters, unequal weights, and calibration (check everything that applies). Survey inference is looking at variation between statistics that could have been produced by your sampling design in conjunction with whatever weight adjustment and estimation procedures you would have. In particular, if you adjust by gender, say, so that any sample you draw should produce 50% female weighted answer, then the sampling variability of that statistic is zero, and your package should show that.

Of the existing packages, @ThomasLumley's library(survey) in R does the best job in accounting for calibration to demographics. Some combination of design declaration survey::surveydesign(), estimation survey::svymean() and producing a difference survey::svycontrast() should get you there.

Related Question