Solved – R caret and NAs

caretdata-imputationmissing datar

I very much prefer caret for its parameter tuning ability and uniform interface, but I have observed that it always requires complete datasets (i. e. without NAs) even if the applied "naked" model allows NAs. That is very bothersome, regarding that one should apply laborous imputation methods, which are not necessary in the first place. How could one evade the imputation and still use caret advantages?

Best Answer

To the train function in caret, you can pass the parameter na.action = na.pass, and no preprocessing (do not specify preProcess, leave it as its default value NULL). This will pass the NA values unmodified directly to the prediction function (this will cause prediction functions that do not support missing values to fail, for those you would need to specify preProcess to impute the missing values before calling the prediction function). For example:

train(formula,
      dataset,
      method = "C5.0",
      na.action = na.pass)

In this case, C5.0 will handle missing values by itself.