Solved – Imputation in R: How to impute univariate data in R

data-imputationmissing datar

I am trying classification(2 classes) using Random Forest. Classes are – Red, Green. My dataset contains 1 numeric attributes(called X), and 51 binary attributes to classify a document into red and green classes. However, 40% of the data points(observations) do not have value for numeric attribute X i.e. X is missing. Hence, trying to impute X. I tried using MI, MICE, and other variants(Hmisc, impute). But I did not find them working.

Is it possible to impute when 40% of the data points are missing? How can one impute an attribute based on its class specific data points? Hmisc allows to use median, min, max etc – however, it is not class specific median – it imputes column wise median in NA's.

Best Answer

If your dataset has a time series character you can have a look at this paper comparing methods for univariate time series imputation in R: http://arxiv.org/abs/1510.03924

But actually I guess you caption is misleading, usually you speak of univariate data if you have just one attribute

What I understood is, you have 52 attributes (1 numeric, 51 binary). So you do not need special algorithms for univariate imputation. The MICE package should be alright for this task. (even with 40% missing data) Perhaps you can post your MICE code, that we see what is going wrong.

Related Question