Solved – Survey design chi square

chi-squared-testrsurvey

Does anyone know a method for comparing two variables with a chi square test if the variables are from different surveys with different svydesign() statements? I am looking to test a difference in a variable distribution across two waves of a survey, but the svychisq() statement is limited to one design object.

Is it legitimate to stack the two variables in a new data.frame, create a new svydesign statement with the collective weights and then run the test?

Best Answer

If you are going along the path of stacking the data sets together, then you should define super-strata corresponding to the two data sets/waves, so that svydesign() knows that they are independent. Thus your new svydesign will have strata = cross of year and strata, the PSUs from the original designs, and the weights from the original designs.

As I suggested in the comment, other ways of combining estimates and tests have been proposed in the literature. Wu (2004) uses empirical likelihood based on common variables between the two data sets.

For continuous variables, ideally, you would want to use Kolmogorov-Smirnov test with "flat" data, but I don't know whether extensions for it work for survey data; I doubt it. So you may have to convert your continuous variables to ordinal ones into say $[\log_2(n)]$ percentile groups or equal width bins of the variable range (where the above function of the sample size is a commonly used number of bins for a histogram), and apply the Rao-Scott $\chi^2$ to them.