I very much prefer caret for its parameter tuning ability and uniform interface, but I have observed that it always requires complete datasets (i. e. without NAs) even if the applied "naked" model allows NAs. That is very bothersome, regarding that one should apply laborous imputation methods, which are not necessary in the first place. How could one evade the imputation and still use caret advantages?
Solved – R caret and NAs
caretdata-imputationmissing datar
Related Question
- Solved – Data imputation with preProcess in caret returns less observations than expected
- Solved – caret preProcess knnImpute error more nearest neighbours than there are points
- Solved – Data Imputation in R with NAs in only one variable (categorical)
- Handling NA Values in R – Can NAs Be Replaced Based on Response Variable?
- Solved – R caret package and dumthe variables
Best Answer
To the train function in caret, you can pass the parameter na.action = na.pass, and no preprocessing (do not specify preProcess, leave it as its default value NULL). This will pass the NA values unmodified directly to the prediction function (this will cause prediction functions that do not support missing values to fail, for those you would need to specify preProcess to impute the missing values before calling the prediction function). For example:
In this case, C5.0 will handle missing values by itself.