Solved – Mean of the bootstrap sample vs statistic of the sample

bootstrapestimation

Say I have a sample and the bootstrap sample from this sample for a stastitic $\chi$ (e.g. the mean). As we all know, this bootstrap sample estimates the sampling distribution of the estimator of the statistic.

Now, is the mean of this bootstrap sample a better estimate of the population statistic than the statistic of the original sample? Under what conditions would that be the case?

Best Answer

Let's generalize, so as to focus on the crux of the matter. I will spell out the tiniest details so as to leave no doubts. The analysis requires only the following:

The arithmetic mean of a set of numbers $z_1, \ldots, z_m$ is defined to be

$$\frac{1}{m}\left(z_1 + \cdots + z_m\right).$$
Expectation is a linear operator. That is, when $Z_i, i=1,\ldots,m$ are random variables and $\alpha_i$ are numbers, then the expectation of a linear combination is the linear combination of the expectations,

$$\mathbb{E}\left(\alpha_1 Z_1 + \cdots + \alpha_m Z_m\right) = \alpha_1 \mathbb{E}(Z_1) + \cdots + \alpha_m\mathbb{E}(Z_m).$$

Let $B$ be a sample $(B_1, \ldots, B_k)$ obtained from a dataset $x = (x_1, \ldots, x_n)$ by taking $k$ elements uniformly from $x$ with replacement. Let $m(B)$ be the arithmetic mean of $B$. This is a random variable. Then

$$\mathbb{E}(m(B)) = \mathbb{E}\left(\frac{1}{k}\left(B_1+\cdots+B_k\right)\right) = \frac{1}{k}\left(\mathbb{E}(B_1) + \cdots + \mathbb{E}(B_k)\right)$$

follows by linearity of expectation. Since the elements of $B$ are all obtained in the same fashion, they all have the same expectation, $b$ say:

$$\mathbb{E}(B_1) = \cdots = \mathbb{E}(B_k) = b.$$

This simplifies the foregoing to

$$\mathbb{E}(m(B)) = \frac{1}{k}\left(b + b + \cdots + b\right) = \frac{1}{k}\left(k b\right) = b.$$

By definition, the expectation is the probability-weighted sum of values. Since each value of $X$ is assumed to have an equal chance of $1/n$ of being selected,

$$\mathbb{E}(m(B)) = b = \mathbb{E}(B_1) = \frac{1}{n}x_1 + \cdots + \frac{1}{n}x_n = \frac{1}{n}\left(x_1 + \cdots + x_n\right) = \bar x,$$

the arithmetic mean of the data.

To answer the question, if one uses the data mean $\bar x$ to estimate the population mean, then the bootstrap mean (which is the case $k=n$) also equals $\bar x$, and therefore is identical as an estimator of the population mean.

For statistics that are not linear functions of the data, the same result does not necessarily hold. However, it would be wrong simply to substitute the bootstrap mean for the statistic's value on the data: that is not how bootstrapping works. Instead, by comparing the bootstrap mean to the data statistic we obtain information about the bias of the statistic. This can be used to adjust the original statistic to remove the bias. As such, the bias-corrected estimate thereby becomes an algebraic combination of the original statistic and the bootstrap mean. For more information, look up "BCa" (bias-corrected and accelerated bootstrap) and "ABC". Wikipedia provides some references.

Related Solutions

Solved – Choosing the number of bootstrap resamples

A bootstrap sample is usually taken to mean that the sample size of the resample is equal to the original sample size. What you are doing is to take resamples from the original sample with larger and larger (re)sample sizes. There is no reason to believe that this will represent the properties of the (original) sampling from the study population.

Say you are interested in the mean of some unknown distribution $F$ (on the real line, to make example specific). The mean (assuming it exists ) $\mu$ of the distribution $F$ is given by $$ \mu(F) = \int_{-\infty}^\infty x \; dF(x) $$ where the integral is a Stieltjes integral. If $F$ is the distribution of some continuous random variable with density $f(x) =F'(x)$ this is the usual integral $\int x f(x) \; dx$ but it also includes the discrete case. The point of writing the expectation in this unusual way is that we can see that the expectation is a functional of the distribution $F$, and also that it unifies the treatment of continuous/discrete cases.

Now we get a sample $x_1, x_2, \dotsc, x_N$ from $F$, and the idea behind bootstrapping is that we represent the distribution $F$ with the sample, and investigates sampling properties of estimators of $\mu$ by resampling from the sample. This makes clear that we need to assume that the sample is reasonably representative of $F$!, so we cannot expect this to work well with too small samples.

Now, our sample size was $N$, so we want properties of estimators of $\mu$ based on a sample of size $N$. Suppose we take resamples of size $n$ (possibly with $n \not = N$). Our resamples is a stand-in for samples from $F$ (that is the whole point with bootstrapping!). Suppose $F$ also has existing variance $\sigma^2$, and we estimate $\mu$ by the empirical mean $$ \bar{x}=\frac{1}{N}\sum_i x_i=\int_{-\infty}^\infty x \;d\hat{F}_N(x) $$ where $\hat{F}_n(x)$ is the empirical distribution function at $x$. Then the variance of this estimator will be $\sigma^2/N$. Lets say we do resampling but with resamples of size $n$. Then the empirical mean based on this resamples will have variance $\sigma^2(\hat{F}_N)/n$ where $\sigma^2(\hat{F}_N)$ is the variance based on the sample. If this empirical variance is a good estimator of $\sigma^2$, this will be approximately $\sigma^2/n$. If $n$ is different from $N$, this cannot be a good representation of the variance of $\bar{x}$, so will not tell you about the real uncertainty in $\bar{x}$ as an estimator of $\mu$.

EDIT

To clarify, the error in the results when using bootstrapping can be decomposed in the sampling error (due to only taking $N$ observations), and the bootstrap error (due to only taking $n < \infty$ resamples). By increasing $n$ we can reduce the later, but not the former.

Sometimes one is deliberately using a bootstrap sample size different from the original. See Can we use bootstrap samples that are smaller than original sample?, Subsample bootstrapping

Solved – Why is the jackknife less computationally intensive than the bootstrap

The jackknife is not intrinsically faster than the bootstrap, as Cliff AB points out below. Nevertheless, two factors sometimes make it faster than the boostrap in practice.

Convention During a jackknife, the estimation step is always done exactly $n$ times: one data point is omitted from each jackknife estimate. If you had a dataset of $n=50$ points, you'd therefore run the estimation procedure 50 times, leaving out the 1st, 2nd, ...nth point in turn. Bootstraps, by comparison, are almost run "a large number of times" (~1000); bootstrapping with only $k=50$ repeats is virtually unheard of and people rarely compute jackknife estimates from absolutely massive samples ($n=10^9$), in part because it would be pointlessly slow.
Optimization Since the entire bootstrap sample is drawn anew on each iteration, each bootstrap samples can be totally different from the others, and so the statistic needs to be computed from scratch. Each jackknife sample, however, is almost identical to the one before it, with the exception of two data points: the one removed during the last iteration (and now added back) and the one removed for the current iteration (which was previously present). This opens the door to some computational optimizations.

For example you want to estimate the mean. For the bootstrap, you're stuck adding all $n$ values together each time; $bn$ additions are required for $b$ bootstrap iterations. For the jackknife estimate, you can instead add all $n$ numbers *once* to find $S=\sum x$. Next, compute the mean for the sample where the $i$th data point is removed as $\frac{S-x_i}{n-1}$. This requires only $2n$ additions/subtractions for the whole jackknife. Similar tricks exist for other statistics.

In fact, closed-form expressions can be derived for the jackknife estimates of certain quantities, allowing you to skip the (re)sampling altogether! For example, Bandos, Guo, and Gur provide a closed-form solution for the variance of auROC here.

Best Answer

Related Solutions

Solved – Choosing the number of bootstrap resamples

Solved – Why is the jackknife less computationally intensive than the bootstrap

Related Question