Solved – Can you perform a multiple imputation on data that is missing not at random (MNAR)

data-imputationregression

Is there a way to identify if your data is MNAR, MAR, or MCAR?

And when performing multiple imputation, should you include all predictor variables even if only 1 or 2 variables have missing values? (SPSS)

Some context:

I am running a multiple logistic regression where one of my variables has almost 20% missing values. I ran a missing pattern analysis through SPSS and found that my data satisfy the monotonicity assumption. However, I don't know whether my data is MNAR, MAR, or MCAR. I suspect MNAR or MAR because it is survey data were participants were allowed to refuse to answer. I could take a wild guess, but I want to know if there is a statistical test/process I can use that will tell me if my data is MNAR or MAR.

Also once I run my MI and build my logistic model, how do I decide if it is better to go with a model that excludes all missing values through list-wise deletion or with my imputed model? Do I look at changes in the Beta coefficients, standard errors, model fit?

Best Answer

Is there a way to identify if your data is MNAR, MAR, or MCAR?

There is Little's MCAR test, which can evaluate if your missings are MCAR. More informations can be found here on page 12. As far as I know there is no test available, which differentiates between MAR and MNAR. In practice I would say that many people just assume MAR, since the treatment of NMAR is very difficult. However, some information about appropriate methods for MNAR can be found here.

And when performing multiple imputation, should you include all predictor variables even if only 1 or 2 variables have missing values?

That depends strongly on your specific data. For data consisting of few variables it is often a good approach to use all variables. With larger data, you should usually do a variable selection, mainly due to computational reasons and to exclude noisy predictors (see IWS' comment below). You can find some guidelines here on page 128. There are 3 groups of variables, which should be included into imputation models: variables that are used in later analyses of imputed data, variables that are related to the missingness structure, and variables that are strong predictors for the variable you want to impute.

Also once I run my MI and build my logistic model, how do I decide if it is better to go with a model that excludes all missing values through list-wise deletion or with my imputed model?

If done right, it should always be better to use the imputed data, since you are able to keep a larger data set and you will eventually be able to reduce bias, which results from the missingness.

Related Question