The short answer, if I understand the questions, is "no". Out of sample error is out of your sample and no bootstrapping or other analytical effort with your sample can calculate it.
In answer to your comment on whether the bootstrap can be used in checking a model with data outside a training set: two possible interpretations.
It would be fine, and absolutely standard, to fit a model on your training set with traditional methods and then use bootstrapping on the training set to check for things like distribution of your estimators, etc. Then use your final model from that training set to test against the test set.
It would be possible to do a bootstrap-like procedure that involves a loop around:
- selecting a subset of the whole sample as your training set
- fit a model to that training set of the data
- compare that model to the testing set of the remaining data and generate some kind of test statistic that says how well the model from the training set goes against the test set.
And then considering the results of doing that many times. Certainly, it would give you some insight into the robustness of your train/test process. It would reassure you that the particular model you got was not just due to the chance of what ended up in the test set in your one split.
However, it's difficult to say exactly why but there seems to me to be a philosophical clash between the idea of a testing/training division and the bootstrap. Perhaps if I didn't think of it as a bootstrap, but just a robustness test of the train/test process it would be ok...
Yes, you can approximate $\mathbb{P}\left(\bar{X}_n \leq x\right)$ by $\mathbb{P}\left(\bar{X}_n^* \leq x\right)$ but it is not optimal. This is a form of the percentile bootstrap. However, the percentile bootstrap does not perform well if you are seeking to make inferences about the population mean unless you have a large sample size. (It does perform well with many other inference problems including when the sample size size is small.) I take this conclusion from Wilcox's Modern Statistics for the Social and Behavioral Sciences, CRC Press, 2012. A theoretical proof is beyond me I'm afraid.
A variant on the centering approach goes the next step and scales your centered bootstrap statistic with the re-sample standard deviation and sample size, calculating the same way as a t statistic. The quantiles from the distribution of these t statistics can be used to construct a confidence interval or perform a hypothesis test. This is the bootstrap-t method and it gives superior results when making inferences about the mean.
Let $s^*$ be the re-sample standard deviation based on a bootstrap re-sample, using n-1 as denominator; and s be the standard deviation of the original sample. Let
$T^*=\frac{\bar{X}_n^*-\bar{X}}{s^*/\sqrt{n}}$
The 97.5th and 2.5th percentiles of of the simulated distribution of $T^*$ can make a confidence interval for $\mu$ by:
$\bar{X}-T^*_{0.975} \frac{s}{\sqrt{n}}, \bar{X}-T^*_{0.025} \frac{s}{\sqrt{n}}$
Consider the simulation results below, showing that with a badly skewed mixed distribution the confidence intervals from this method contain the true value more frequently than either the percentile bootstrap method or a traditional inverstion of a t statistic with no bootstrapping.
compare.boots <- function(samp, reps = 599){
# "samp" is the actual original observed sample
# "s" is a re-sample for bootstrap purposes
n <- length(samp)
boot.t <- numeric(reps)
boot.p <- numeric(reps)
for(i in 1:reps){
s <- sample(samp, replace=TRUE)
boot.t[i] <- (mean(s)-mean(samp)) / (sd(s)/sqrt(n))
boot.p[i] <- mean(s)
}
conf.t <- mean(samp)-quantile(boot.t, probs=c(0.975,0.025))*sd(samp)/sqrt(n)
conf.p <- quantile(boot.p, probs=c(0.025, 0.975))
return(rbind(conf.t, conf.p, "Trad T test"=t.test(samp)$conf.int))
}
# Tests below will be for case where sample size is 15
n <- 15
# Create a population that is normally distributed
set.seed(123)
pop <- rnorm(1000,10,1)
my.sample <- sample(pop,n)
# All three methods have similar results when normally distributed
compare.boots(my.sample)
This gives the following (conf.t is the bootstrap t method; conf.p is the percentile bootstrap method).
97.5% 2.5%
conf.t 9.648824 10.98006
conf.p 9.808311 10.95964
Trad T test 9.681865 11.01644
With a single example from a skewed distribution:
# create a population that is a mixture of two normal and one gamma distribution
set.seed(123)
pop <- c(rnorm(1000,10,2),rgamma(3000,3,1)*4, rnorm(200,45,7))
my.sample <- sample(pop,n)
mean(pop)
compare.boots(my.sample)
This gives the following. Note that "conf.t" - the bootstrap t version - gives a wider confidence interval than the other two. Basically, it is better at responding to the unusual distribution of the population.
> mean(pop)
[1] 13.02341
> compare.boots(my.sample)
97.5% 2.5%
conf.t 10.432285 29.54331
conf.p 9.813542 19.67761
Trad T test 8.312949 20.24093
Finally here is a thousand simulations to see which version gives confidence intervals that are most often correct:
# simulation study
set.seed(123)
sims <- 1000
results <- matrix(FALSE, sims,3)
colnames(results) <- c("Bootstrap T", "Bootstrap percentile", "Trad T test")
for(i in 1:sims){
pop <- c(rnorm(1000,10,2),rgamma(3000,3,1)*4, rnorm(200,45,7))
my.sample <- sample(pop,n)
mu <- mean(pop)
x <- compare.boots(my.sample)
for(j in 1:3){
results[i,j] <- x[j,1] < mu & x[j,2] > mu
}
}
apply(results,2,sum)
This gives the results below - the numbers are the times out of 1,000 that the confidence interval contains the true value of a simulated population. Notice that the true success rate of every version is considerably less than 95%.
Bootstrap T Bootstrap percentile Trad T test
901 854 890
Best Answer
Did you find a satisfactory answer for this question?
I recently found this reference:
http://www.wseas.us/e-library/conferences/2009/hangzhou/ACACOS/ACACOS21.pdf
but I am pretty sure that the issue must have been investigated before. While it is easy to justify the use of observation weights (in practice, by weighting observations, you are hoping to use a better estimate of the unknown distribution function F), I would like to find the relevant background.