My questions deals with GAMs in the mgcv R package. Due to a small sample size I want to determine the prediction error using leave-one-out cross-validation. Is this reasonable? Is there a package or code how I can do this? The errorest()
function in the ipred package does not work. A simple test dataset is:
library(mgcv)
set.seed(0)
dat <- gamSim(1,n=400,dist="normal",scale=2)
b<-gam(y~s(x0)+s(x1)+s(x2)+s(x3),data=dat)
summary(b)
pred <- predict(b, type="response")
Thank you very much for your helping hand!
Best Answer
I really like the package
caret
for things like this but unfortunately I just read that you can't specify theformula
ingam
exactly for it."When you use train with this model, you cannot (at this time) specify the gam formula. caret has an internal function that figures out a formula based on how many unique levels each predictor has etc. In other words, train currently determines which terms are smoothed and which are plain old linear main effects."
source: https://stackoverflow.com/questions/20044014/error-with-train-from-caret-package-using-method-gam
but if you let
train
select the smooth terms, in this case it produces your model exactly anyway. The default performance metric in this case is RMSE, but you can change it using thesummaryFunction
argument of thetrainControl
function.I think one of the main drawbacks of LOOCV is that when the dataset is large, it takes forever. Since your dataset is small and it works quite fast, I think it is a sensible option.
Hope this helps.
output: