I found this thought experiment helpful when thinking about confidence intervals. It also answers your question 3.
Let $X\sim U(0,1)$ and $Y=X+a-\frac{1}{2}$. Consider two observations of $Y$ taking the values $y_1$ and $y_2$ corresponding to observations $x_1$ and $x_2$ of $X$, and let $y_l=\min(y_1,y_2)$ and $y_u=\max(y_1,y_2)$. Then $[y_l,y_u]$ is a 50% confidence interval for $a$ (since the interval includes $a$ if $x_1<\frac12<x_2$ or $x_1>\frac12>x_2$, each of which has probability $\frac14$).
However, if $y_u-y_l>\frac12$ then we know that the probability that the interval contains $a$ is $1$, not $\frac12$. The subtlety is that a $z\%$ confidence interval for a parameter means that the endpoints of the interval (which are random variables) lie either side of the parameter with probability $z\%$ before you calculate the interval, not that the probability of the parameter lying within the interval is $z\%$ after you have calculated the interval.
If the bootstrapping procedure and the formation of the confidence interval were performed correctly, it means the same as any other confidence interval. From a frequentist perspective, a 95% CI implies that if the entire study were repeated identically ad infinitum, 95% of such confidence intervals formed in this manner will include the true value. Of course, in your study, or in any given individual study, the confidence interval either will include the true value or not, but you won't know which. To understand these ideas further, it may help you to read my answer here: Why does a 95% Confidence Interval (CI) not imply a 95% chance of containing the mean?
Regarding your further questions, the 'true value' refers to the actual parameter of the relevant population. (Samples don't have parameters, they have statistics; e.g., the sample mean, $\bar x$, is a sample statistic, but the population mean, $\mu$, is a population parameter.) As to how we know this, in practice we don't. You are correct that we are relying on some assumptions--we always are. If those assumptions are correct, it can be proven that the properties hold. This was the point of Efron's work back in the late 1970's and early 1980's, but the math is difficult for most people to follow. For a somewhat mathematical explanation of the bootstrap, see @StasK's answer here: Explaining to laypeople why bootstrapping works . For a quick demonstration short of the math, consider the following simulation using R
:
# a function to perform bootstrapping
boot.mean.sampling.distribution = function(raw.data, B=1000){
# this function will take 1,000 (by default) bootsamples calculate the mean of
# each one, store it, & return the bootstrapped sampling distribution of the mean
boot.dist = vector(length=B) # this will store the means
N = length(raw.data) # this is the N from your data
for(i in 1:B){
boot.sample = sample(x=raw.data, size=N, replace=TRUE)
boot.dist[i] = mean(boot.sample)
}
boot.dist = sort(boot.dist)
return(boot.dist)
}
# simulate bootstrapped CI from a population w/ true mean = 0 on each pass through
# the loop, we will get a sample of data from the population, get the bootstrapped
# sampling distribution of the mean, & see if the population mean is included in the
# 95% confidence interval implied by that sampling distribution
set.seed(00) # this makes the simulation reproducible
includes = vector(length=1000) # this will store our results
for(i in 1:1000){
sim.data = rnorm(100, mean=0, sd=1)
boot.dist = boot.mean.sampling.distribution(raw.data=sim.data)
includes[i] = boot.dist[25]<0 & 0<boot.dist[976]
}
mean(includes) # this tells us the % of CIs that included the true mean
[1] 0.952
Best Answer
In the usual way:
Then note that
p
contains a component$se.fit
with standard errors of the predictions for observations innewdata
. You can then form CI by multipliying the SE by a value appropriate to your desired level. E.g. an approximate 95% confidence interval is formed as:You substitute in an appropriate value from a $t$ or Gaussian distribution for the interval you need.
Note that I use
type = "link"
as you don't say if you have a GAM or just an AM. In the GAM, you need to form the confidence interval on the scale of the linear predictor and then transform that to the scale of the response by applying the inverse of the link function:Now note that these are very approximate intervals. In addition these intervals are point-wise on the predicted values and they don't take into account the fact that the smoothness selection was performed.
A simultaneous confidence interval can be computed via simulation from the posterior distribution of the parameters. I have an example of that on my blog.
If you want a confidence interval that is not conditional upon the smoothing parameters (i.e. one that takes into account that we do not know, but instead estimate, the values of the smoothness parameters), then add
unconditional = TRUE
to thepredict()
call.Also, if you don't want to do this yourself, note that newer versions of mgcv have a
plot.gam()
function that returns an object with all data used to create the plots of the smooths and their confidence intervals. You can just save the output fromplot.gam()
in an objand then inspect
obj
, which is a list with one component per smooth. AddseWithMean = TRUE
to theplot()
call to get confidence intervals that are not conditional upon smoothness parameter.