based on code presented in thread:
How to find a GBM Prediction Interval
I am trying to apply this to my dataset. Below is my full code, and I am having issues with the bootstrap function.
library(caret)
require(foreign)
set.seed(825)
Ridership <- read.spss("V:/Metro/Coverage/ROUTE_MODEL2.sav",use.value.labels=TRUE, to.data.frame = TRUE)
set.seed(825)
fitControl <- trainControl(method = "cv", number = 2)
gbmGrid <- expand.grid(interaction.depth = (20:21), n.trees = (750), shrinkage = c(0.07))
x <- Ridership[, -148]
y <- Ridership[, 148]
gbmFit <- train(x=x,y=y,"gbm", tuneGrid = gbmGrid, n.minobsinnode = 2, trControl =fitControl, verbose=FALSE)
gbmFit
x.pt <- quantile(Ridership$TOT_RIDERSHIP, c(0.25, 0.5, 0.75))
p <- plot(gbmFit, newdata = Ridership[, -148], grid.levels = x.pt, return.grid = TRUE)
p
library(boot)
bootfun <- function(data, indices) {
data <- data[indices,]
x <- Ridership[, -148]
y <- Ridership[, 148]
gbmFit <- train(x=x,y=y,"gbm", tuneGrid = gbmGrid, n.minobsinnode = 2, trControl =fitControl, verbose=FALSE)
plot(gbmFit, newdata = Ridership[, -148], grid.levels = x.pt, return.grid = TRUE)$y
}
b <- boot(data = Ridership, statistic = bootfun, R = 5)
lims <- t(apply(b$t, 2, FUN = function(x) quantile(x, c(0.025, 0.975))))
When I run the code, the lim(only show 1, and nothing more. I am not exactly sure what to define in the Bootstrap function. I have flipped through the bootstrap package code, but it still is not clear to me what I am doing wrong. Thanks in advance!
Best Answer
Hard to say without a reproducible example but some pointers based on what I can understand from the code:
plot.gbm
you need to pass agbm
object as well as the variable to plot. In your case something likep <- plot(gbmFit$finalModel, i.var = ..., grid.levels = x.pt, ...)
, wherei.var
is the variable you want partial dependencies for. The length defaults to 100, see?plot.gbm
and I think that is why you get 100 points. The presence ofgrid.levels
should override this ifplot.gbm
is called correctly.The book Applied Predictive Modeling by Max Kuhn and Kjell Johnson is a great go-to source for tuning gbm ensembles (and tuning predictive models in general).
Also note that this code gives confidence intervals for predicted values instead of prediction intervals as pointed out by the comments in the above mentioned thread.
Hope this helps!