Solved – What inferential method produces the empirical CDF

empirical-cumulative-distr-fnestimationnonparametric

The empirical cdf is an estimate of the cdf. What kind of estimation method (such as method of moments, MLE, …) constructs the empirical cdf?

Is the empirical cdf a nonparametric estimate? Do nonparametric estimates have construction methods different from MOM, MLE, and others?

For example, the empirical pmf of a finite-valued discrete distribution is constructed by MLE. But that is a special case, and not the whole story. Moreover, it is not an empirical cdf but an empirical pmf.

Best Answer

In An Introduction to the Bootstrap, Efron and Tibshirani find it useful to characterize the empirical cumulative distribution function (ecdf) as the nonparametric maximum likelihood estimate of the "underlying population" $F$.


Given data $x_1, x_2, \ldots, x_n$, the likelihood function (by definition) is the product of the probabilities

$$L(F) = \prod_{i=1}^n {\Pr}_F(x_i).$$

E&T claim this is maximized by the ecdf. Since they leave it as an exercise, let's work out the solution here. It's not completely trivial, because we have to account for the possibility of duplicates among the data. Let's take care with the notation, then. Let $x_1, \ldots, x_m$ be the distinct data values, with $x_i$ appearing $k_i \ge 1$ times in the dataset. (Thus, $x_{m+1}, \ldots, x_n$ are all duplicates of the first $m$ values.) The ecdf is the discrete distribution that assigns probability $k_i/n$ to $x_i$ for $1 \le i \le m$.

For any distribution $F$, the likelihood $L(F)$ has $k_i$ terms equal to $p_i = {\Pr}_F(x_i)$ for each $i$. It therefore is completely determined by the vector $p=(p_1, p_2, \ldots, p_m)$ and can be computed as

$$L(F) = L(p) = \prod_{i=1}^m p_i^{k_i}.$$

Since the likelihood for the ecdf is nonzero, the maximum likelihood will be nonzero. Therefore, for any distribution $\hat F$ that maximizes the likelihood, $p_i = {\Pr}_{\hat F}(x_i)$ must be nonzero for all the data. The Axiom of Total Probability asserts the sum of the $p_i$ is at most $1$. This reduces the problem to a constrained optimization:

$$\text{Maximize } L(p) = \prod_{i=1}^m p_i^{k_i}$$

subject to

$$p_i \gt 0, i=1, 2, \ldots m;\quad \sum_{i=1}^m p_i \le 1.$$

This can be solved in many ways. Perhaps the most direct is to use a Lagrange multiplier $\lambda$ to optimize $\log L$, which produces the critical equations

$$\left(\frac{p_1}{k_1}, \frac{p_2}{k_2}, \ldots, \frac{p_m}{k_m}\right) = \lambda\left(1, 1, \ldots, 1\right)$$

with unique solution $$\hat p_i = \frac{k_i}{k_1+\cdots+k_m} = \frac{k_i}{n},$$

precisely the ecdf, QED.


Why is this point of view important? Here are E&T:

As a result, [any] functional statistic $t(\hat F)$ is the nonparametric maximum likelihood estimate of the parameter $t(F)$. In this sense, the nonparametric bootstrap carries out nonparametric maximum likelihood inference.

[Section 21.7, p. 310]

Some words of explanation: "as a result" follows from the (easily proven) fact that the MLE (maximum likelihood estimate) of any function of a parameter is that function of the MLE of the parameter. A "functional statistic" (or "plug-in" statistic) is one that depends only on the distribution function. As an example of this distinction, E&T point out that the usual unbiased variance estimator $s^2 = \sum (x_i-\bar x)^2/(n-1) $ is not a functional statistic because if you were to double all the data, the ecdf would not change, but the $s^2$ would be multiplied by $2(n-1)/(2n-1)$, which does change (albeit only slightly). Functional statistics are crucial to understanding and analyzing the Bootstrap.


Reference

Bradley Efron and Robert J. Tibshirani, An Introduction to the Bootstrap. Chapman & Hall, 1993.

Related Question