The terminology is probably not used consistently, so the following is only how I understand the original question. From my understanding, the normal CIs you computed are not what was asked for. Each set of bootstrap replicates gives you one confidence interval, not many. The way to compute different CI-types from the results of a set of bootstrap replicates is as follows:
B <- 999 # number of replicates
muH0 <- 100 # for generating data: true mean
sdH0 <- 40 # for generating data: true sd
N <- 200 # sample size
DV <- rnorm(N, muH0, sdH0) # simulated data: original sample
Since I want to compare the calculations against the results from package boot
, I first define a function that will be called for each replicate. Its arguments are the original sample, and an index vector specifying the cases for a single replicate. It returns $M^{\star}$, the plug-in estimate for $\mu$, as well as $S_{M}^{2\star}$, the plug-in estimate for the variance of the mean $\sigma_{M}^{2}$. The latter will be required only for the bootstrap $t$-CI.
> getM <- function(orgDV, idx) {
+ bsM <- mean(orgDV[idx]) # M*
+ bsS2M <- (((N-1) / N) * var(orgDV[idx])) / N # S^2*(M)
+ c(bsM, bsS2M)
+ }
> library(boot) # for boot(), boot.ci()
> bOut <- boot(DV, statistic=getM, R=B)
> boot.ci(bOut, conf=0.95, type=c("basic", "perc", "norm", "stud"))
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 999 bootstrap replicates
CALL :
boot.ci(boot.out = bOut, conf = 0.95, type = c("basic", "perc", "norm", "stud"))
Intervals :
Level Normal Basic Studentized Percentile
95% ( 95.6, 106.0 ) ( 95.7, 106.2 ) ( 95.4, 106.2 ) ( 95.4, 106.0 )
Calculations and Intervals on Original Scale
Without using package boot
you can simply use replicate()
to get a set of bootstrap replicates.
boots <- t(replicate(B, getM(DV, sample(seq(along=DV), replace=TRUE))))
But let's stick with the results from boot.ci()
to have a reference.
boots <- bOut$t # estimates from all replicates
M <- mean(DV) # M from original sample
S2M <- (((N-1)/N) * var(DV)) / N # S^2(M) from original sample
Mstar <- boots[ , 1] # M* for each replicate
S2Mstar <- boots[ , 2] # S^2*(M) for each replicate
biasM <- mean(Mstar) - M # bias of estimator M
The basic, percentile, and $t$-CI rely on the empirical distribution of bootstrap estimates. To get the $\alpha/2$ and $1 - \alpha/2$ quantiles, we find the corresponding indices to the sorted vector of bootstrap estimates (note that boot.ci()
will do a more complicated interpolation to find the empirical quantiles when the indices are not natural numbers).
(idx <- trunc((B + 1) * c(0.05/2, 1 - 0.05/2)) # indices for sorted vector of estimates
[1] 25 975
> (ciBasic <- 2*M - sort(Mstar)[idx]) # basic CI
[1] 106.21826 95.65911
> (ciPerc <- sort(Mstar)[idx]) # percentile CI
[1] 95.42188 105.98103
For the $t$-CI, we need the bootstrap $t^{\star}$ estimates to calculate the critical $t$-values. For the standard normal CI, the critical value will just be the $z$-value from the standard normal distribution.
# standard normal CI with bias correction
> zCrit <- qnorm(c(0.025, 0.975)) # z-quantiles from std-normal distribution
> (ciNorm <- M - biasM + zCrit * sqrt(var(Mstar)))
[1] 95.5566 106.0043
> tStar <- (Mstar-M) / sqrt(S2Mstar) # t*
> tCrit <- sort(tStar)[idx] # t-quantiles from empirical t* distribution
> (ciT <- M - tCrit * sqrt(S2M)) # studentized t-CI
[1] 106.20690 95.44878
In order to estimate the coverage probabilities of these CI-types, you will have to run this simulation many times. Just wrap the code into a function, return a list with the CI-results and run it with replicate()
like demonstrated in this gist.
If you want to just do a simple bootstrap of a median CI then all you need is a median function that accepts indices.
med <- function(y, indices) median(y[indices])
Then you can just...
b <- boot(dat, med, 1000)
boot.ci(b)
(you might want to plot(b)
to examine your boostrap)
But in your case, assuming your simple exponential model described the data well, I would just be tempted to let the entire simulation speak for itself in a figure. Plot your data distribution and overlay your simulation. If you use density curves your simulation can have a much higher N than the data.
densDat <- density(dat)
plot(densDat)
densSim <- density(sim)
lines(densSim, col = 'red')
(You might want to adjust the bw
argument in density
to smooth the curve.)
Best Answer
There won't be any advantage to bootstrapping the SE in your example because you have a very large sample size. The distribution of means of that sample size is going to be normal, not skewed, because of the central limit theorem [CLT] (try
hist(skewLeftbootData)
). Even if it were skewed the SE is going to be so small because of N that the SE is not going to be appreciably skewed anyway. Then you're using for proof the backward calculation of an SD based on the SE of the bootstrap distribution calculated through conventional means. Even if the bootstrap distribution were skewed you've just tossed out one of the reasons you might do bootstrap in this case.Bootstrapping would be more compelling if you had substantially smaller sample (say 12) and calculated your SE as the middle 67% of the bootstrapped data by cutoffs of the sorted bootstrap distribution. Then you would see that that is a different estimate than an SE calculated from the conventional SD. You also wouldn't then calculate a bootstrapped SD based on the cut offs.