Solved – How is splitting done on numerical predictors in randomForest package in R

rrandom forest

I understand that for the individual trees, a least squares measure is used to measure node impurity, given candidate splits of the data at that split, and the best split is selected.

What I don't understand yet (since I couldn't find an answer in the documentation) is how candidate splits are found in the first place, i.e., given numerical predictors (not nominal or ordinal), how are the split points found for those numerical predictors in the randomForest package?

Aside: I am also wondering whether ordinal predictors and dependent variables are supported in randomForest now?

Best Answer

It is the same as with ordinal variables -- the algorithm goes from the minimal to a maximal value present in the attributes as a candidate for a threshold and selects best. This can be elegantly speed-up to linear complexity using presorting.

Because of that randomForest simply converts ordered factors to numerical values for predictors and to categorical data in case of decision.