Bootstrap – Does Bootstrap Work for Combined Estimates Like $\widehat{\alpha\beta}$?

bootstrapconfidence interval

Let $\textbf{X}_1, \dots, \textbf{X}_n$ be an (iid) random sample. From this random sample, we compute $\hat{\alpha}$ (an estimation of a certain parameter $\alpha$). Let $\textbf{Y}_1, \dots, \textbf{Y}_n$ be an (iid) random sample. From this random sample, we compute $\hat{\beta}$ (an estimation of a certain parameter $\beta$).

I would like to obtain CI (confidence intervals) for parameter ${\alpha\beta}$.

I utilize the basic bootstrap technique in order to obtain CI for $\alpha$ and $\beta$ separately. This involves random sampling, with replacement, from our random sample to generate multiple bootstrap samples, each matching the size of our original dataset. For each bootstrap sample, I calculate the estimate of the statistic. Subsequently, we determine the 95
-quantiles of the re-sampled statistics to derive the confidence intervals.

Say that we proved the asymptotical correctness of the bootstrap CI for $\hat{\alpha}$ and for $\hat{\beta}$.

Question:
How can we estimate confidence intervals for parameter ${\alpha\beta}$ if $X$-sequence and $Y$-sequence are independent? Can we do something if they are not?

Best Answer

This is more complicated than it sounds, so is a good question.

To start off, the bootstrap doesn't work for $\hat\alpha^2$ when $\alpha=0$, which is a classic example of bootstrap failure. The reason it doesn't work is that it relies on the delta-method, which doesn't exactly fail but does become much less helpful when the function you're computing has derivative zero at the true parameters (but not at the 'true' parameters in the bootstrap world)

(Here I mean the function $(a,b)\mapsto ab$)

So, this deserves simulation. Here are qqplots for the sampling distribution and four examples of the bootstrap distribution in samples of size 1000 from some Normal distributions

f <- function(m1, m2){
 r <- replicate(10000,{
  x <- rnorm(1000,m=m1)
  y <- rnorm(1000,m=m2)
  alphahat <- mean(x)
  betahat <- mean(y)
  alphahat*betahat - m1*m2
  })
 qqnorm(r, main=paste0("x mean is ",m1,"; y mean is ",m2))

 for (i in 1:4){
  x <- rnorm(1000, m=m1)
  y <- rnorm(1000, m=m2)

  s <- replicate(10000, {
    xstar <- sample(x, replace=TRUE)
    ystar <- sample(y, replace=TRUE)
    mean(xstar)*mean(ystar) - mean(x)*mean(y)
   })

  a <- qqnorm(s, plot.it=FALSE)
  points(a, col=adjustcolor(i+1, alpha.f=.2))
 }
}

enter image description here

At (0,0) the shape is qualitatively right, but the bootstrap distributions are further than you might expect from the sampling distribution. At (10,10) and (10,0) everything is fine: normality, good bootstrap approximation. At (0.1, 0) it's starting to go bad again.

When does it work?

If $\alpha$ and $\beta$ are both not near zero then we have $\sqrt{n}(\hat\alpha-\alpha)$ and $\sqrt{n}(\hat\beta-\beta)$ both asymptotically Normal and the bootstrap is correct for each. The product function is differentiable at $(\alpha,\beta)$ with non-zero derivative, and everything is fine.

If $\alpha=0$ and $\beta$ is large, the well-behaved delta-method term $$(\hat\alpha-\alpha)\frac{\partial(\alpha\beta)}{\partial \alpha}=(\hat\alpha-\alpha)\beta$$ dominates the small badly-behaved term and everything is still fine (and similarly if $\beta=0$ and $\alpha$ is large)

But if $\alpha=0$ and $\beta$ is small or vice versa, you get breakdown in the bootstrap and no asymptotic Normality.

Extra credit

Things get even worse with non-zero correlation. Here I take $X\sim N(0,1)$ and $Y=X+N(0,1)$

Normal quantile-quantile plot showing different distributions

or with a qqplot of r vs s, where you can see the $X^2$ component of $XY$ behaving differently from the error component. enter image description here

Related Question