Solved – Is centering needed when bootstrapping the sample mean

bootstrapcenteringdistributionsresampling

When reading about how to approximate the distribution of the sample mean I came across the nonparametric bootstrap method. Apparently one can approximate the distribution of $\bar{X}_n-\mu$ by the distribution of $\bar{X}_n^*-\bar{X}_n$, where $\bar{X}_n^*$ denotes the sample mean of the bootstrap sample.

My question then is: Do I need the centering? What for?

Couldn't I just approximate $\mathbb{P}\left(\bar{X}_n \leq x\right)$ by $\mathbb{P}\left(\bar{X}_n^* \leq x\right)$?

Best Answer

Yes, you can approximate $\mathbb{P}\left(\bar{X}_n \leq x\right)$ by $\mathbb{P}\left(\bar{X}_n^* \leq x\right)$ but it is not optimal. This is a form of the percentile bootstrap. However, the percentile bootstrap does not perform well if you are seeking to make inferences about the population mean unless you have a large sample size. (It does perform well with many other inference problems including when the sample size size is small.) I take this conclusion from Wilcox's Modern Statistics for the Social and Behavioral Sciences, CRC Press, 2012. A theoretical proof is beyond me I'm afraid.

A variant on the centering approach goes the next step and scales your centered bootstrap statistic with the re-sample standard deviation and sample size, calculating the same way as a t statistic. The quantiles from the distribution of these t statistics can be used to construct a confidence interval or perform a hypothesis test. This is the bootstrap-t method and it gives superior results when making inferences about the mean.

Let $s^*$ be the re-sample standard deviation based on a bootstrap re-sample, using n-1 as denominator; and s be the standard deviation of the original sample. Let

$T^*=\frac{\bar{X}_n^*-\bar{X}}{s^*/\sqrt{n}}$

The 97.5th and 2.5th percentiles of of the simulated distribution of $T^*$ can make a confidence interval for $\mu$ by:

$\bar{X}-T^*_{0.975} \frac{s}{\sqrt{n}}, \bar{X}-T^*_{0.025} \frac{s}{\sqrt{n}}$

Consider the simulation results below, showing that with a badly skewed mixed distribution the confidence intervals from this method contain the true value more frequently than either the percentile bootstrap method or a traditional inverstion of a t statistic with no bootstrapping.

compare.boots <- function(samp, reps = 599){
    # "samp" is the actual original observed sample
    # "s" is a re-sample for bootstrap purposes

    n <- length(samp)

    boot.t <- numeric(reps)
    boot.p <- numeric(reps)

    for(i in 1:reps){
        s <- sample(samp, replace=TRUE)
        boot.t[i] <- (mean(s)-mean(samp)) / (sd(s)/sqrt(n))
        boot.p[i] <- mean(s)
    }

    conf.t <- mean(samp)-quantile(boot.t, probs=c(0.975,0.025))*sd(samp)/sqrt(n)
    conf.p <- quantile(boot.p, probs=c(0.025, 0.975))

    return(rbind(conf.t, conf.p, "Trad T test"=t.test(samp)$conf.int))
}

# Tests below will be for case where sample size is 15
n <- 15

# Create a population that is normally distributed
set.seed(123)
pop <- rnorm(1000,10,1)
my.sample <- sample(pop,n)
# All three methods have similar results when normally distributed
compare.boots(my.sample)

This gives the following (conf.t is the bootstrap t method; conf.p is the percentile bootstrap method).

          97.5%     2.5%
conf.t      9.648824 10.98006
conf.p      9.808311 10.95964
Trad T test 9.681865 11.01644

With a single example from a skewed distribution:

# create a population that is a mixture of two normal and one gamma distribution
set.seed(123)
pop <- c(rnorm(1000,10,2),rgamma(3000,3,1)*4, rnorm(200,45,7))
my.sample <- sample(pop,n)
mean(pop)
compare.boots(my.sample)

This gives the following. Note that "conf.t" - the bootstrap t version - gives a wider confidence interval than the other two. Basically, it is better at responding to the unusual distribution of the population.

> mean(pop)
[1] 13.02341
> compare.boots(my.sample)
                97.5%     2.5%
conf.t      10.432285 29.54331
conf.p       9.813542 19.67761
Trad T test  8.312949 20.24093

Finally here is a thousand simulations to see which version gives confidence intervals that are most often correct:

# simulation study
set.seed(123)
sims <- 1000
results <- matrix(FALSE, sims,3)
colnames(results) <- c("Bootstrap T", "Bootstrap percentile", "Trad T test")

for(i in 1:sims){
    pop <- c(rnorm(1000,10,2),rgamma(3000,3,1)*4, rnorm(200,45,7))
    my.sample <- sample(pop,n)
    mu <- mean(pop)
    x <- compare.boots(my.sample)
    for(j in 1:3){
        results[i,j] <- x[j,1] < mu & x[j,2] > mu
    }
}

apply(results,2,sum)

This gives the results below - the numbers are the times out of 1,000 that the confidence interval contains the true value of a simulated population. Notice that the true success rate of every version is considerably less than 95%.

     Bootstrap T Bootstrap percentile          Trad T test 
             901                  854                  890 
Related Question