Solved – Does the ‘fit’ attribute of a gbm object contain the OOB estimates

boosting

I have looked at the help (?gbm) and documentation for gbm.object, it only says that

fit: a vector containing the fitted values on the scale of
regression function (e.g. log-odds scale for bernoulli, log
scale for poisson)

Is it the OOB estimates like randomForest? That would seem unlikely to me since best_iter is not supplied to get it unless gbm has secretly decided which test I prefer.

Is there an efficient way to get the OOB estimates at all since I've already ran 5-fold cv to get best_iter?

Best Answer

I don't think you're going to find what you're looking for. First of all, there is no real concept of "OOB Predictions" for a full gbm fit. It does save the OOB decrease (or increase) in error after each tree, but that does not equate to an OOB prediction. Since the trees are in sequence (boosted) instead of in parallel (bagged) there is no way to get "untainted" predictions for the training data.

It sound like you are actually looking for the Out-Of-Fold (OOF) predictions. Calling gbm with Cross Validation enabled will make k+1 fits, but I don't think it saves anything other than the mean cross-validation error metric at each iteration. I've moved away from using the internal cross-validation functionality for this reason. I fold it (or bag it) myself and save the predictions from the folds.

And yes, these OOF predictions are valuable if you want to see untainted predictions of the training data or if you want to ensemble a gbm with another algorithm.