Solved – R gbm – handling of missing values

boostingr

I am trying to understand how gbm handles missing values.

I have seen this thread on the topic:

https://stackoverflow.com/questions/14718648/r-gbm-handling-of-missing-values

But it focusses on explaining how the results show how missing values are treated. What I am interested in is how the algorithm treats missing values when fitting the trees. E.g. does it consider a missing value to contain information, or does it essentially ignore that feature?

I have not been able to find this information online so any responses would be much appreciated.

Best Answer

Update - the gbm package builds trees with three splits (left node, right node, and missing node). Therefore the model treats the missing values as a separate group.

This is explained in the gbm.object documentation, in the section on c.splits: https://www.rdocumentation.org/packages/gbm/versions/2.1.1/topics/gbm.object

Related Question