Solved – Why are missing values MNAR harder to impute than MCAR or MAR

data-imputationmissing data

Reading papers related to the imputation of missing values related to the -omics field, systematically imputation algorithms were less accurate when imputing MNAR compared to imputing MCAR.
My intuition is the following: Missing values are classified as MNAR when there is a process behind the generation of the data that influence the missing values. Then to be able to impute those MNAR, it is not enough to find relationships between features, rather, it is more important to know the process which is behind the missing values to impute.

Is my intuition right? am I missing other essential points?

Best Answer

There can be "a process behind the generation of the data that influence the missing values" while the data are nevertheless "missing at random" (MAR) in the technical sense (and thus suitable for multiple imputation). What's required for data to be MAR is that "the missingness can be explained by variables on which you have full information".

The problem with data "missing not at random" (MNAR) is that your data by themselves do not contain adequate information about the missingness. Data MNAR could be due to a relation between the probability of missingness and the "true" value itself, but they could also be due to a relation of missingness to some other variable that was not included in the data. That's also why it's impossible to prove that data are MAR; you never know about possible unknown unknowns.

Related Question