[Math] Parametric vs Non-parametric Estimation of Quantiles

pr.probabilityst.statistics

Motivation

Suppose that we need to estimate the median from a normal distribution with known variance. One non-parametric approach is to use the sample median as an estimator. However, this does not take advantage of our distributional assumptions. Exploiting, the symmetry of the normal distribution, we can, instead, use the sample mean as an estimator of the median. This new estimator is more efficient than the sample median. In fact, it is known that the median is asymptotic inefficient in normal sampling, and the relative efficiency is given by $\frac \pi 2$.

The previous paragraph suggests a similar approach for any given quantile: we estimate the mean, fit a normal distribution, and analytically compute the quantile for the estimated distribution. It is not hard to show that the relative efficiency of the non-parametric estimator is worse than $\frac \pi 2$.

Does the previous result holds in a more general sense?

Problem

Consider a sample $X_1,\ldots, X_n$ drawn independently from a population with p.d.f. $f(x|\theta)$, where $\theta$ is a scalar unknown parameter to be estimated. We impose the following regularity conditions for the distribution: (i) the p.d.f. is continuously differentiable in $\theta$, (ii) the c.d.f. $F(x|\theta)$ is strictly monotonic, (iii) the distributions $f(x|\theta)$ have common support.

We are interested in estimating the $p-$quantile, which is given $q=F^{-1}(p|\theta)$. We are going to consider two estimators. The first one is the sample quantile. The second is going to be our parametric estimator, and is constructed as follows. First, we obtain the maximum likelihood estimator of our unknown parameter $\theta$. Then, we compute the quantile analytically using our fitted distribution.

Non-parametric estimator. Our non-parametric estimator is the sample $p-$quantile, which we denote by $\hat{q}_n$. From Bahadurs’ representation, we know that our estimator is asymptotically, normal and verifies that
$$
\sqrt {n} (\hat{q}_m – q) \Rightarrow \mathcal{N}(0,u^\prime(\theta)),
$$
where the variance is $u^\prime(\theta) = \frac {p(1 – p)} {f(q|\theta)^2}$.

Parametric estimator. Next, let $\hat{\theta}_n^{\rm mle}$ be the maximum likelihood estimator of the unknown parameter. The parametric estimator is given by $\hat{q}_n^{\rm mle}=F^{-1}(p|\hat{\theta}_n^{\rm mle})$. In turn, by the invariance property of the MLE, it is the case that $\hat{q}_n^{\rm mle}$ is the maximum likelihood estimator of the quantile $q$. As a consequence, under some regularity conditions, we have that our new estimator is asymptotically normal
$$
\sqrt{n} (\hat{q}_n^{\rm mle} – q) \Rightarrow \mathcal{N}(0,u(\theta)),
$$
where the $u(\theta)$ is the Cramer-Rao lower bound on the variance of any unbiased estimator. That is, $\hat{q}_n^{\rm mle}$ is a consistent, and asymptotically efficient estimator of $q$. The Cramer-Rao lower bound is $u(\theta) = \left( \frac {\partial F^{-1}} {\partial \theta} (p | \theta) \right)^2 I(\theta)^{-1}$, where $I(\theta)$ is the Fisher information of parameter $\theta$.

Analysis. Both the parametric and the non-parametric estimators converge, as the number of samples increases, to the true quantile. We want to study the efficiency, as given by the variance, of these estimators. The relative efficiency of the non-parametric estimator to the MLE, denoted by $\varepsilon(\theta)$, is
$$
\varepsilon(\theta) = \frac {u^\prime(\theta)} {u(\theta)} = p (1 – p) I(\theta) \left( \frac {\partial F} {\partial \theta} (v | \theta) \right)^{-2},
$$
where we have used the fact that $\frac {\partial F^{-1}} {\partial \theta} (p | \theta) = – \frac {\partial F} {\partial \theta} (q | \theta) / f(q|\theta)$. In view of Cramer-Rao lower bound, we have that $\varepsilon(\theta) \ge 1$.

An example. To fix ideas we consider a simple example. Suppose that $X \sim \exp(\theta)$. The maximum likelihood estimator is given by $\hat{\theta}_n^{\rm mle} = \left( \frac 1 n \sum_{i=1}^n X_i \right)^{-1}$, and the Fisher information is $I(\theta) = \theta^{-2}$. The true quantile is $v = – \theta^{-1} \ln (1-p)$. Hence, the efficiency is $\varepsilon(\theta) = \frac {p} {(1-p) \ln^2(1-p)}$. In this case, the efficiency is lower bounded by $\varepsilon(\theta) \ge 1.544$. The lower bound is tight, and attained at $p \approx 0.80$.

Questions

  1. What is it known of the parametric estimation of quantiles?

  2. In experiments with some common distributions we have seen that $\varepsilon(\theta) \ge \frac 3 2$. Is it possible to prove, under some assumptions, a more tighter bound for the relative efficiency than $\varepsilon(\theta) \ge 1$?

Best Answer

Just in case someone is following, I want to post a somewhat negative answer to my second question. I found an example that satisfies the assumptions, and achieves an efficiency arbitrarily close to 1.

The example is inspired on Laplace distribution with an unknown location parameter $\theta$, and p.d.f. $f(x|\theta) = \frac 1 2 e^{-|x-\theta|}$. In this case, when $p=\frac 1 2$, both estimators coincide and the efficiency is 1. This is due to the fact that (i) the MLE of the location parameter of a Laplace distribution is the median, and (i) the distribution is symmetric.

The problem with Laplace distribution is that it does not satisfy our assumptions: its log likelihood is not differentiable because of the absolute value in the exponent. The trick is to replace the absolute value by an analytic approximation, such as $\frac 1 k \ln(\cosh(k x))$, which converges point-wise to the absolute value as $k\rightarrow\infty$. Indeed, the sequence of distributions given by $$ f_k(x|\theta) = \frac {a_k} 2 \cosh(k (x-\theta))^{- 1 / k}, $$ where $a_k$ is a normalization constant, achieve an efficiency of 1 as $k$ goes to infinity.

Bummer.

Related Question