Solved – MCAR test for large number variables and small sample size

data preprocessingmcarmissing datamultiple-imputationr

I have a dataset with 101 observations and 402 columns (those columns comprise several multiple-item questionnaires). Among those 402 columns, 10 of them are categorical and the remaining are continuous. There are 82 observations without missing values, which means 18% of the entire sample (N=101) has one or more missing values. There are 14 missing patterns with 10 of them containing only one observation.

I would like to examine if the missing data mechanism is MCAR in this dataset. I have tried LittleMCAR ("BaylorEdPsych" package) and TestMCARNormality ("MissMech" package) to test for MCAR using RStudio (version 1.1.442). However, LittleMCAR only allows a maximum of 50 variables; TestMCARNormality did not work out either. The code for TestMCARNormality is shown as below:

## df[1:10] were excluded due to their class of categorical variables; del.lesscases was set to 1 because default for this parameter was 6, and if set to default it would have no missing pattern in the test.

TestMCARNormality(df[11:402], del.lesscases = 1)

After I ran this code, it showed the following error:

Error in solve.default(sigoo) :
system is computationally singular: reciprocal condition number = 1.37693e-17

I wasn't sure why this happen and what it meant. I was wondering if anyone knows how I can address this issue and how I can test MCAR for my dataset. Thank you in advance for your time; sincerely appreciated!!

Best Answer

MissMech imputes missing data in order to do its MCAR test. Having more variables (columns) than observations (rows) can cause problems for data imputation, which I would guess is what is happening here.

Also, when you do an MCAR test, you are assuming that the missing data mechanism is random, i.e. you're assuming MCAR, and the p-value tells you how consistent the missing data pattern is with that assumption.

You have to use your knowledge about the subject at hand to justify that assumption. I.e. you could have theoretical reasons to believe the missing data is MCAR, MAR or MNAR.

In the absence of any strong theoretical reasons, you could take a more exploratory approach to missing data mechanisms, see Tierney et al. (2014).