Solved – Dealing with missing data when both outcome and covariates are missing

chi-squared-testmissing data

I have two categorical variables. There are missing values in both variables, and they do not necessarily occur simultaneously.

For example, the data might look like this:

ID Rating Status
1 Good Good
2 - Good
3 OK Bad
4 OK -
5 - -
6 Bad OK

I would like to see if there is an association between Rating and Status (Chi-squared test), what can I do about the missing values?

Thanks in advance.

Best Answer

What I would do to assess the type of missingness is partition your results into four groups: complete cases, first answer blank, second answer blank, both answers blank.

In complete case set, look at the marginal distributions of $X_1$ and $X_2$, separately. Compare the marginal distribution of $X_1$ in the complete case category to the marginal distribution of $X_1$ where the second answer was blank. See if they're similar. (You could do some test if you'd like - perhaps chi-squared to compare the categories under missingness and completeness - if you want to more methodically make a decision.)

Repeat for $X_2$. If the distributions are similar, then it's evidence that your data are MCAR (missing completely at random) as missingness doesn't affect the marginal distributions. This will help you to assess what mission data method (likely imputation) you should use.

Hope this helps! Let me know if this is unclear.