I don't understand how the gbm's parameter "bag.fraction" works.
For me, gradient boosting works globally like that :
- Fit a tree f_hat_b with d splits to the training data (X, r) (where r_i=y_i for the 1st step).
- Update f_hat by adding in a shrunken version of the new tree : f_hat <- f_hat + lambda*f_hat_b(x)
- Fit a tree with d splits to the training data (X,r) where r <- r – lambda*f_hat_b(x).
and repeat these 3 actions B times.
So, for me, if we have N observations, we would have N residuals at the second step. Does bag.fraction=p means that we will fit only p*N residuals ? And with what observations ?
Also, as I'm interested in having interactions between predictors, I would like to use interact.gbm. Do I have to do before a gbm with interaction.depth > 1 (strictly !) ?
Thank you !
Best Answer
No, the fit is still performed on $N$ residuals.
For instance at iteration $B$, the $p*N$ observations are only used for the training of the tree. Once the tree trained, the $N$ new residuals (your step 3), are computed using the prediction on the $N$ observations of the function $F_B(x)=F_{B-1}(x) + \lambda\widehat{f}(x) $ where is $\widehat{f}(x)$ is you newly built tree. On the next iteration ($B+1$), a fraction $p$ will be drawn to train a new tree, etc.
For further details, I recommend you to check https://statweb.stanford.edu/~jhf/ftp/stobst.pdf section 2. On the box
Algorithm 2
you can see line 5 and 6 that the tree is trained on the subsample draw ($N*p$ observations) and on line 7 that the prediction is made using $F_B(x)=F_{B-1}(x) + \lambda\widehat{f}(x)$.I am not sure that I understood well your question.If you want to have interactions between your predictors, a necessary condition is to allow for trees with at least two splits, hence to set
interaction.depth > 1
.