Bootstrap Variance – Detailed Analysis of Bootstrap Variance for Squared Sample Mean

bootstrapmathematical-statisticsnonparametricvariance

The following is question 8 of chapter 8 in Wasserman's All of Statistics:

Let $T_n = \overline{X}_n^2$, $\mu = \mathbb{E}(X_1)$,
$\alpha_k = \int|x – \mu|^kdF(x)$, and $\hat{\alpha}_k = n^{-1}\sum_{i=1}^n|X_i – \overline{X}_n|^k$.

Show that $$v_{\mathrm{boot}} = \frac{4\overline{X}_n^2\hat{\alpha}_2}{n} + \frac{4\overline{X}_n\hat{\alpha}_3}{n^2} + \frac{\hat{\alpha}_4}{n^3} \>.$$

He previously defines
$v_{\mathrm{boot}} = \frac{1}{B}\sum_{b=1}^B(T_{n,b}^* – \frac{1}{B}\sum_{r=1}^BT_{n,r}^*)^2$, where $T_{n,i}^*$ is the desired statistic from the $i$th bootstrap replication of the sample $X_1,…,X_n$.

It seems that the question as stated does not make sense: how can there be a formula for the bootstrap variance if the quantity requires simulation? Perhaps he meant to ask for the variance of the sampling distribution, but I get $\frac{\sigma^2}{n}$ for that. Any hints on how to intepret or solve this?

Best Answer

A little late, but anyways... First, to simplify later calculations, rewrite the sample mean in terms of an expression containing the central moments under the empirical measure. Let $S_n = \frac{1}{n}\sum(X_i - \bar{X}_n) = 0$. Then $$ \bar{X}_n = S_n +\bar{X}_n = \frac{1}{n}\sum (X_i - \bar{X}_n) + \bar{X}_n $$ Now, Var$(\bar{X}_n^2) = E(\bar{X}_n^4) - (E\bar{X}_n^2)^2$. We'll tackle the first term. Note that $\bar{X}_n$ is the mean under the empirical measure, so we treat it as a constant when taking expectations. $$ \begin{align} E(\bar{X}_n^4) &= E(S_n + \bar{X}_n)^4 \\ &= E(S_n^4 + 4\bar{X}_nS_n^3 + 6\bar{X}_n^2S_n^2 + 4\bar{X}_n^3S_n + \bar{X}_n^4) \\ &= E(S_n^4) + 4\bar{X}_nE(S_n^3) + 6\bar{X}_n^2E(S_n^2) + \bar{X}_n^4 \end{align} $$ where we used that $S_n = 0$ to drop the second-to-last term. In the following expansions, terms involving a product with $nS_n$ will not be written. $$ \begin{align} E(S_n^4) &= E\left(\frac{1}{n^4}\left[\sum(X_i - \bar{X}_n)^4 + \sum\sum(X_i - \bar{X}_n)^2(X_j - \bar{X}_n)^2\right]\right) \\ &= \frac{\hat{a}_4}{n^3} + \frac{3(n-1)\hat{a}_2^2}{n^3}\\ E(S_n^3) &= \left(\frac{1}{n^3}\sum(X_i - \bar{X}_n)^3\right) = \frac{\hat{a}_3}{n^2}\\ E(S_n^2) &= \left(\frac{1}{n^2}\sum(X_i - \bar{X}_n)^2\right) = \frac{\hat{a}_2}{n} \end{align} $$ These are straightforward sums of products with some combinatorics to count the number of terms. Doing similar calculations for the second term of the variance and putting it all together: $$ Var(\bar{X}_n^2) = \frac{4\bar{X}_n^2\hat{a}_2}{n} + \frac{4\bar{X}_n\hat{a}_3}{n^2} + \frac{\hat{a}_4 + (2n - 3)\hat{a}_2^2}{n^3} $$

Related Question