Solved – How to run chi-squared test on imputed data

chi-squared-testdata-imputation

I have a survey data set with missing values and I generated 10 multiple imputations in which the missing values were imputed. There are several categorical variables in the data sets and I'd like to see if there's any association among these variables using chi-squared test.

However, I've been searching how to run this analysis in either SAS or Stata but didn't find a solution. It seems that mi estimate doesn't support svy: tab in Stata. I can run chi-squared test on each imputed data set using in SAS using proc surveyfreq, however, I don't know how to combine the results. Can anyone please guide me through the steps on how to test the associations between categorical variables?

Thank you in advance!
Jin

Best Answer

How to do this has been elaborated by Rubin (1988, p.87) and Li et al. (1991).

First, you take mean $\chi^2$ across multiply imputed data sets.

$$\chi^2=\frac{1}{m}\sum _{l=1}^{m}\chi^2_l$$

You estimate the relative increase in variance:

$$r=(1+\frac{1}{m})\frac{1}{m-1}\sum _{l=1}^{m}(\sqrt{\chi^2_l}-\sqrt{\chi^2})^2$$

And the test statistic:

$$D_x=\frac{\frac{\chi^2}{k}-\frac{m+1}{m-1}r}{1+r}$$

where $k$ is the degrees of freedom of $\chi^2$. Now, $D_x$ is F-distributed:

$$P_x = Pr(F_{k,v} \gt D_x)$$

where

$$v=k^{-3/m}(m-1)(1+\frac{1}{r})^2$$

which gives you the Chi-square test across multiply imputed data sets.

Li, K.-H., Meng, X.-L., Raghunathan, T. E., & Rubin, D. B. (1991). Significance levels from repeated p-values with multiply-imputed data. Statistica Sinica, 1(1), 65–92.

Rubin, D. B. (1988). Multiple Imputation for Nonresponse in Surveys. John Wiley & Sons.

Related Question