Solved – What does interaction depth mean in GBM

boostingmachine learningr

I had a question on the interaction depth parameter in gbm in R. This may be a noob question, for which I apologize, but how does the parameter, which I believe denotes the number of terminal nodes in a tree, basically indicate X-way interaction among the predictors? Just trying to understand how that works. Additionally, I get pretty different models if I have a dataset with say two different factor variables versus the same dataset except those two factor variables are combined into a single factor (e.g. X levels in factor 1, Y levels in factor 2, combined variable has X * Y factors). The latter is significantly more predictive than the former. I had thought increasing interaction depth would pick this relationship up.

Best Answer

Both of the previous answers are wrong. Package GBM uses interaction.depth parameter as a number of splits it has to perform on a tree (starting from a single node). As each split increases the total number of nodes by 3 and number of terminal nodes by 2 (node $\to$ {left node, right node, NA node}) the total number of nodes in the tree will be $3*N+1$ and the number of terminal nodes $2*N+1$. This can be verified by having a look at the output of pretty.gbm.tree function.

The behaviour is rather misleading, as the user indeed expects the depth to be the depth of the resulting tree. It is not.