EDIT: As pointed out in the comments below this gives the confidence intervals for predictions and not strictly the prediction intervals. Was a bit trigger happy with my reply and should have given this some extra thought.
Feel free to ignore this answer or try to build on the code to get the prediction intervals.
I have used the simple bootstrap for creating prediction intervals a few times but there may be other (better) ways.
Consider the oil
data in the caret
package and suppose we want to generate partial dependencies and 95% intervals for the effect of Stearic on Palmitic. Below is just a simple example but you can play around with it to suit your needs. Make sure the gbm
package is update to allow the grid.points
argument in plot.gbm
library(caret)
data(oil)
#train the gbm using just the defaults.
tr <- train(Palmitic ~ ., method = "gbm" ,data = fattyAcids, verbose = FALSE)
#Points to be used for prediction. Use the quartiles here just for illustration
x.pt <- quantile(fattyAcids$Stearic, c(0.25, 0.5, 0.75))
#Generate the predictions, or in this case, the partial dependencies at the selected points. Substitute plot() for predict() to get predictions
p <- plot(tr$finalModel, "Stearic", grid.levels = x.pt, return.grid = TRUE)
#Bootstrap the process to get prediction intervals
library(boot)
bootfun <- function(data, indices) {
data <- data[indices,]
#As before, just the defaults in this example. Palmitic is the first variable, hence data[,1]
tr <- train(data[,-1], data[,1], method = "gbm", verbose=FALSE)
# ... other steps, e.g. using the oneSE rule etc ...
#Return partial dependencies (or predictions)
plot(tr$finalModel, "Stearic", grid.levels = x.pt, return.grid = TRUE)$y
#or predict(tr$finalModel, data = ...)
}
#Perform the bootstrap, this can be very time consuming. Just 99 replicates here but we usually want to do more, e.g. 500. Consider using the parallel option
b <- boot(data = fattyAcids, statistic = bootfun, R = 99)
#Get the 95% intervals from the boot object as the 2.5th and 97.5th percentiles
lims <- t(apply(b$t, 2, FUN = function(x) quantile(x, c(0.025, 0.975))))
This is one way to do it which at least try to account for the uncertainties arising from tuning the gbm. A similar approach has been used in http://onlinelibrary.wiley.com/doi/10.2193/2006-503/abstract
Sometimes the point estimate is outside the interval, but modifying the tuning grid (i.e., increasing the number of trees and/or the depth) usually solves that.
Hope this helps!
Hard to say without a reproducible example but some pointers based on what I can understand from the code:
- For
plot.gbm
you need to pass a gbm
object as well as the variable to plot. In your case something like p <- plot(gbmFit$finalModel, i.var = ..., grid.levels = x.pt, ...)
, where i.var
is the variable you want partial dependencies for. The length defaults to 100, see ?plot.gbm
and I think that is why you get 100 points. The presence of grid.levels
should override this if plot.gbm
is called correctly.
- 2-fold CV is not likely to give a good estimate of performance, and with 75 data points I would use the bootstrap (or maybe leave one out CV) and restrict the trees to be of depth 1--3 (or so) unless you have very strong prior knowledge that you require depth 20 or 21 trees.
The book Applied Predictive Modeling by Max Kuhn and Kjell Johnson is a great go-to source for tuning gbm ensembles (and tuning predictive models in general).
Also note that this code gives confidence intervals for predicted values instead of prediction intervals as pointed out by the comments in the above mentioned thread.
Hope this helps!
Best Answer
In general, prediction intervals are considered better than point estimates. While it's great to have a good estimate for what a stock price will be tomorrow, it's much better to be able to be able to give a range of values that the stock price is very likely to be in.
That being said, it's generally more difficult to produce reliable prediction intervals than merely produce point estimates that have good prediction properties. For example, in many cases we can show that with non-constant variance, we can still produce a consistent estimator of the mean of the new value even if we ignore the non-constant variance issue. However, we definitely need a reliable estimate of the variance function to produce prediction intervals.
I've heard of people just treating this as another level of a machine learning problem: the first level is to produce a function $\hat f(x_i) = E[\hat y_i | x_i] $, the estimates of the values and the second level is to produce a function $\hat V(x_i) = E[(y_i - \hat y_i)^2 | x_i]$, the estimates of the variance of the function given the inputs. In theory, this should work (given enough data with a stable function), but in practice, it must be handled with a lot of care, as variance estimates are inherently much less stable than mean estimates. In short, you should expect to need much more data to accurately estimate $\hat V(x_i)$ than $\hat f(x_i)$.
So there's definitely nothing about prediction intervals that is "cheating" compared with just producing point estimates. It's just harder to do. As an empirical example, in the M4 forecasting competition, only 2 of the 15 methods that produced 95% prediction intervals had nearly correct coverage; most of the other prediction intervals had coverage in the 80-90% range (see slide 35 in the link).