Solved – Estimating quantiles by simulation

bootstrapmonte carloquantiles

I'm a bit confused about how I would go about estimating quantiles by simulation. Say I have some statistical model $f(x,\theta)$. I can estimate the parameter $\theta$ and am able to generate random variates $X_{1}, \ldots, X_{n}$. There are two ways I can think of to estimate a quantile of the distribution:

  1. Generate a very large sample and compute the sample quantile (is this a Monte Carlo approach?). This gives me a point estimate.

  2. Generate many large samples, compute the sample quantile I'm interested in for each and then take the mean (this seems to be the parametric bootstrap)

I realise that option 2. is actually an estimate of the expected value of the sampling distribution of the sample quantile (which if it is an (asymptotically) unbiased estimator, is (asymptotically) equal to the true quantile I'm trying to estimate). So therefore this should also provide a point estimate.

Option 1 seems to require the consistency of the sample quantile as justification.

I'm still a bit confused as to when I would do what and why. Maybe I can get some clarification here.

Thanks!

Best Answer

If you first estimate $\theta$ by $\hat\theta$, a direct estimate of the $\alpha$-quantile is $F_{\hat\theta}^{-1}(\alpha)$. This is a convergent and biased estimator, whose asymptotic variance can be derived by the delta-method.

Your first solution is validated by the Glivenko–Cantelli theorem, namely the fact that the empirical cdf converges to the true cdf:$$\hat{F}_n(x)=\frac{1}{n}\sum_{i=1}^n \mathbb{I}_{X_i\le x} \stackrel{n\to\infty}{\longrightarrow}F_{\hat\theta}(x)$$Once again, $\hat{F}_n^{-1}(\alpha)$ is a convergent and biased estimator, which variance can be estimated by boostrap.

Your second method uses an average of estimators validated by your first method, hence it is equally valid and equally biased. However, for a given computing budget, i.e., a pre-determined total number of simulations, you have to run an experiment to compare both methods.

For instance, running a toy experiment aiming at estimating the normal 80% quantile (equal to 0.8416212) shows the difference between both your approaches.

#method 1
R=10^3
N=10^4
x=matrix(rnorm(R*N),ncol=R)
kant=apply(x,1,quantile,prob=.80)

leads to

> sd(kant)
[1] 0.04513716
> mean(kant)
[1] 0.8404416

while

#method2
R=10^3
M=N=10^2
kant=rep(0,R)
for (r in 1:R){
  x=matrix(rnorm(M*N),ncol=M)
  kant[r]=mean(apply(x,1,quantile,prob=.80))
  }

leads to

> sd(kant)
[1] 0.01375016
> mean(kant)
[1] 0.8285708

hence to a smaller variance but to a larger bias. (This experiment does not account for the variability in replacing $\theta$ by $\hat\theta$.)