I'm currently reading the book, An Introduction to Statistical Learning, and I'm struggling a little with the bootstrap approach. As far as I understand, I can use a bootstrap in almost all situations to obtain a standard error for a particular statistic. Does it make sense to use a bootstrap when computing the MSE for a linear regression model? If yes, do I sample both the training data and the test data or do I train the model once and then draw different test sets. In the latter case, do I draw the test data form the same collection of data as the training data or should I always keep my test data separate?
In other words, does the following R code make sense?
MSE <- function(model, data) { ... }
boot.mse <- function(object, data, index) {
train <- head(index, ceil(length(index) * 0.9))
test <- tail(index, floor(length(index) * 0.1))
MSE(lm(object, data[train,]), data[test,]) # Calculate test MSE
}
boot(my_data, boot.mse, 1000, object = some_model_or_formula)
Best Answer