Missing Data Methods – Should Missing Values be Handled by Imputation or Deletion?

data-imputationmissing data

I have 60,000 data and around 45% of them is missing and the missing values are random. Can I simply use listwise or pairwise deletion or do I have to use imputation? If imputation is recommended which imputation is the best one?

Best Answer

It depends on

  1. Amount of missing data (what percentage of data is missing)
  2. Type of missing data (MAR, MCAR, NMAR)

According to this nice article (Tsikriktsis: A review of techniques for treating missing data in OM survey research, 2005), if more than 10% data is missing, the best solution is

  1. Maximum likelihood imputation if data are NMAR (non-missing at random)
  2. Maximum likelihood and hot-deck if data are MAR (missing at random)
  3. Pairwise deletion, hot-deck or regression if data are MCAR (missing completely at random)
Related Question