Solved – About GBM function parameters (Bag.fraction, interaction.depth)

baggingboostinginteractionr

I don't understand how the gbm's parameter "bag.fraction" works.
For me, gradient boosting works globally like that :

  1. Fit a tree f_hat_b with d splits to the training data (X, r) (where r_i=y_i for the 1st step).
  2. Update f_hat by adding in a shrunken version of the new tree : f_hat <- f_hat + lambda*f_hat_b(x)
  3. Fit a tree with d splits to the training data (X,r) where r <- r – lambda*f_hat_b(x).

and repeat these 3 actions B times.

So, for me, if we have N observations, we would have N residuals at the second step. Does bag.fraction=p means that we will fit only p*N residuals ? And with what observations ?

Also, as I'm interested in having interactions between predictors, I would like to use interact.gbm. Do I have to do before a gbm with interaction.depth > 1 (strictly !) ?

Thank you !

Best Answer

Does bag.fraction=p means that we will fit only p*N residuals ?

No, the fit is still performed on $N$ residuals.

For instance at iteration $B$, the $p*N$ observations are only used for the training of the tree. Once the tree trained, the $N$ new residuals (your step 3), are computed using the prediction on the $N$ observations of the function $F_B(x)=F_{B-1}(x) + \lambda\widehat{f}(x) $ where is $\widehat{f}(x)$ is you newly built tree. On the next iteration ($B+1$), a fraction $p$ will be drawn to train a new tree, etc.

For further details, I recommend you to check https://statweb.stanford.edu/~jhf/ftp/stobst.pdf section 2. On the box Algorithm 2 you can see line 5 and 6 that the tree is trained on the subsample draw ($N*p$ observations) and on line 7 that the prediction is made using $F_B(x)=F_{B-1}(x) + \lambda\widehat{f}(x)$.

Also, as I'm interested in having interactions between predictors, I would like to use interact.gbm. Do I have to do before a gbm with interaction.depth > 1 (strictly !) ?

I am not sure that I understood well your question.If you want to have interactions between your predictors, a necessary condition is to allow for trees with at least two splits, hence to set interaction.depth > 1.

Related Question