Solved – how does rpart handle missing values in predictors

rrpart

From the ?rpart documentation –

na.action : the default action deletes all observations for which y is missing, but keeps those in which one or more predictors are
missing.

How does it impute missing values in predictors?

Best Answer

This is where the surrogate variables come in - for each split, observations where the split variable is missing are split based on the best surrogate variable, if that's missing by the next best and so on, this is detailed in:

  • Therneau, Terry M. & Atkinson Elizabeth J. (March 28, 2014). An Introduction to Recursive Partitioning Using the RPART Routines, Mayo Foundation, section 5.

The document is accessible through rpart help (pdf).