Solved – Estimating quantiles by simulation

bootstrapmonte carloquantiles

I'm a bit confused about how I would go about estimating quantiles by simulation. Say I have some statistical model $f(x,\theta)$. I can estimate the parameter $\theta$ and am able to generate random variates $X_{1}, \ldots, X_{n}$. There are two ways I can think of to estimate a quantile of the distribution:

Generate a very large sample and compute the sample quantile (is this a Monte Carlo approach?). This gives me a point estimate.
Generate many large samples, compute the sample quantile I'm interested in for each and then take the mean (this seems to be the parametric bootstrap)

I realise that option 2. is actually an estimate of the expected value of the sampling distribution of the sample quantile (which if it is an (asymptotically) unbiased estimator, is (asymptotically) equal to the true quantile I'm trying to estimate). So therefore this should also provide a point estimate.

Option 1 seems to require the consistency of the sample quantile as justification.

I'm still a bit confused as to when I would do what and why. Maybe I can get some clarification here.

Thanks!

Best Answer

If you first estimate $\theta$ by $\hat\theta$, a direct estimate of the $\alpha$-quantile is $F_{\hat\theta}^{-1}(\alpha)$. This is a convergent and biased estimator, whose asymptotic variance can be derived by the delta-method.

Your first solution is validated by the Glivenko–Cantelli theorem, namely the fact that the empirical cdf converges to the true cdf:$$\hat{F}_n(x)=\frac{1}{n}\sum_{i=1}^n \mathbb{I}_{X_i\le x} \stackrel{n\to\infty}{\longrightarrow}F_{\hat\theta}(x)$$Once again, $\hat{F}_n^{-1}(\alpha)$ is a convergent and biased estimator, which variance can be estimated by boostrap.

Your second method uses an average of estimators validated by your first method, hence it is equally valid and equally biased. However, for a given computing budget, i.e., a pre-determined total number of simulations, you have to run an experiment to compare both methods.

For instance, running a toy experiment aiming at estimating the normal 80% quantile (equal to 0.8416212) shows the difference between both your approaches.

#method 1
R=10^3
N=10^4
x=matrix(rnorm(R*N),ncol=R)
kant=apply(x,1,quantile,prob=.80)

leads to

> sd(kant)
[1] 0.04513716
> mean(kant)
[1] 0.8404416

while

#method2
R=10^3
M=N=10^2
kant=rep(0,R)
for (r in 1:R){
  x=matrix(rnorm(M*N),ncol=M)
  kant[r]=mean(apply(x,1,quantile,prob=.80))
  }

leads to

> sd(kant)
[1] 0.01375016
> mean(kant)
[1] 0.8285708

hence to a smaller variance but to a larger bias. (This experiment does not account for the variability in replacing $\theta$ by $\hat\theta$.)

Related Solutions

Solved – Estimation of quantile given quantiles of subset

I suppose you deal with a data vector of length $n$ where $n$ is so large that it becomes necessary to spread the computations over many machines and that this vector cannot fit inside the memory of any single one of those machines. This squares neatly with the definition of big data as defined here.

Suppose that the dataset $\{X_i\}_{i=1}^n$ is composed of independent draws from a distribution $F$ and denote $q(\alpha)$ the $\alpha$ quantile of $F$. Throughout, I will assume that $F$ is differentiable at $q(\alpha)$ and that $F^\prime(q(\alpha))>0$.

The sample estimator of $q(\alpha)$ is $\hat{q}_n(\alpha)$ and is defined as the minimizer over $t\in\mathbb{R}$ of:

$$(0)\quad h_n(t)=\sum_{i=1}^n\rho_\alpha(X_i-t).$$

where $\rho_{\alpha}(u)=|u(\alpha-\mathbb{I}_{(u<0)})|$ and $\mathbb{I}$ is the indicator variable [0].

Assume that the dataset $\{X_i\}_{i=1}^n$ is partitioned into $k$ non-overlapping sub-samples of lengths $\{m_j\}_{j=1}^k$. I will denote $\{\hat{q}_n^{j}(\alpha)\}_{j=1}^k$ the estimators of $\hat{q}_n(\alpha)$ obtained by solving (0) on the respective sub-samples. Then, if:

$$\min_{j=1}^k\lim_{n\to\infty} \frac{m_j}{n}>0$$ the estimator:

$$\bar{q}_n(\alpha)=\sum_{j=1}^k\frac{m_j}{n}\hat{q}_n^{j}(\alpha)$$

satisfies:

$$\sqrt{n}(\bar{q}_n(\alpha)-\hat{q}_n(\alpha))=o_p(1).$$

(You can find more details of these computations including the asymptotic variances of $\bar{q}_n(\alpha)$ in [1]).

Therefore, letting $m_j=m\;\forall j$, you can use non overlapping sub-samples of your original data set to get a computationally convenient estimate of $\hat{q}_n(\alpha)$. Given $p$ computing units, this divide and conquer strategy will have cost $O(km/p)$ and storage $O(km)$ (versus $O(n)$ and $O(n)$ for computing (0) directly) at an asymptotically negligible cost in terms of precision. The finite sample costs will depend on how small $F^\prime(\alpha)$ is, but will typically be acceptable as soon as $m\geq 2^7$ to $m\geq 2^6$.

Compared to running the parallel version of $\texttt{nth_element}$ I linked to in my comment above, the implementation based on sub-samples will be simpler (and also scale much better as $p$ becomes larger than the number of cores on a single computing machine). Another advantage of the approach based on sub-samples over the one based on the parallel version of $\texttt{nth_element}$ is that the $O(km)$-space complexity can be split across the memory of multiple machines, even if $O(n)$ cannot fit inside the memory of any single one of them.

On the minus side, the solution based on $\bar{q}_n(\alpha)$ will not be permutation invariant (changing the order of the observation will give a different result).

[0]. Koencker, R. Quantile regression.
[1]. Knight, K. and Bassett, J.W. Second order improvements of sample quantiles using subsamples

Monte Carlo – How to Use Monte Carlo Estimator for Quantile

The remark is rather ill-thought, as it confuses the theoretical quantile that is solution to $F(q_\alpha)=\alpha$ with the empirical quantile that is solution to $\hat{F}_n(\hat{q}_\alpha)=\alpha$. Assuming you have an iid sample $X_1,\ldots,X_n$ from $F$, it is always possible to derive $\hat{F}_n$ by the ecdf function in R:

#Assuming x is the notation for the sample
Fn=ecdf(x)
plot(Fn)
#Taking a particular value of alpha
alpha=0.1017
abline(h=alpha)

This picture tells you where the empirical quantile should be, roughly, without giving you the solution to the equation $\hat{F}_n(\hat{q}_\alpha)=\alpha$ that you can solve by dyadic divide-and-conquer strategies or by calling the quantile function.

If the probability $\alpha$ is arbitrary, there will be not exact solution to this equation$$\hat{F}_n(\hat{q}_\alpha)=\alpha$$since $\hat{F}_n$ only takes $n+1$ possible values. (This is also visible from the above graph.) In that case, the "solution" is found as the smallest observation for which $\hat{F}_n(x)$ is above $\alpha$, mimicking the resolution in the theoretical case where $$F^{-1}(\alpha)=\inf\{x\,|\, F(X)\geq \alpha \}$$