Solved – Binary classification when many binary features are missing

classificationmissing datasemi-supervised-learning

I'm working on a binary classification problem, with about 1000 binary features in total. The problem is that for each datapoint, I only know the values of a small subset of the features (around 10-50), and the features in this subset are pretty much random.

What's a good way to deal with the problem of the missing features? Is there a particular classification algorithm that handles missing features well? (Naive Bayes should work, but is there anything else?) I'm guessing I don't want to do some kind of variable imputation, since I have so many missing features.

Best Answer

Assuming data are considered missing completely at random (cf. @whuber's comment), using an ensemble learning technique as described in the following paper might be interesting to try:

Polikar, R. et al. (2010). Learn++.MF: A random subspace approach for the missing feature problem. Pattern Recognition, 43(11), 3817-3832.

The general idea is to train multiple classifiers on a subset of the variables that compose your dataset (like in Random Forests), but to use only the classifiers trained with the non-missing features for building the classification rule. Be sure to check what the authors call the "distributed redundancy" assumption (p. 3 in the preprint linked above), that is there must be some equally balanced redundancy in your features set.

Related Question