Solved – Bootstrapping estimates of out-of-sample error

bootstrapout-of-sampleresampling

I know how to use bootstrap re-sampling to find confidence intervals for in-sample error or R2:

# Bootstrap 95% CI for R-Squared
library(boot)
# function to obtain R-Squared from the data 
rsq <- function(formula, data, indices) {
  d <- data[indices,] # allows boot to select sample 
  fit <- lm(formula, data=d)
  return(summary(fit)$r.square)
} 
# bootstrapping with 1000 replications 
results <- boot(data=mtcars, statistic=rsq, 
     R=1000, formula=mpg~wt+disp)

# view results
results 
plot(results)

# get 95% confidence interval 
boot.ci(results, type="bca")

But what if I want to estimate out-of-sample error (somewhat akin to cross-validation)? Could I fit a model to each boostrap sample, and then use that model to predict for each other bootstrap sample, and then average the RMSE of those predictions?

Best Answer

The short answer, if I understand the questions, is "no". Out of sample error is out of your sample and no bootstrapping or other analytical effort with your sample can calculate it.

In answer to your comment on whether the bootstrap can be used in checking a model with data outside a training set: two possible interpretations.

It would be fine, and absolutely standard, to fit a model on your training set with traditional methods and then use bootstrapping on the training set to check for things like distribution of your estimators, etc. Then use your final model from that training set to test against the test set.

It would be possible to do a bootstrap-like procedure that involves a loop around:

  • selecting a subset of the whole sample as your training set
  • fit a model to that training set of the data
  • compare that model to the testing set of the remaining data and generate some kind of test statistic that says how well the model from the training set goes against the test set.

And then considering the results of doing that many times. Certainly, it would give you some insight into the robustness of your train/test process. It would reassure you that the particular model you got was not just due to the chance of what ended up in the test set in your one split.

However, it's difficult to say exactly why but there seems to me to be a philosophical clash between the idea of a testing/training division and the bootstrap. Perhaps if I didn't think of it as a bootstrap, but just a robustness test of the train/test process it would be ok...

Related Question