Solved – Is Maximum Likelihood Estimation (MLE) a parametric approach

bootstrapintuitionmachine learningmaximum likelihoodstatistical significance

There are two main probabilistic approaches to novelty detection: parametric and non-parametric. The non-parametric approach assumes that the distribution or density function is derived from the training data, like kernel density estimation (e.g., Parzen window), while parametric approach assumes that the data comes from a known distribution.

I am not familiar with the parametric approach. Could anyone show me some well known algorithms? By the way, can MLE be considered as a kind of parametric approach (the density curve is known, and then we seek to find the parameter corresponding to the maximum value)?

Best Answer

Usually, maximum likelihood is used in a parametric context. But the same principle can be used nonparametrically. For example, if you have data consisting in observation from a continuous random variable $X$, say observations $x_1, x_2, \dots, x_n$, and the model is unrestricted, that is, just saying the data comes from a distribution with cumulative distribution function $F$, then the empirical distribution function $$ \hat{F}_n(x) = \frac{\text{number of observations $x_i$ with $x_i \le x$}}{n} $$ the non-parametric maximum likelihood estimator.

This is related to bootstrapping. In bootstrapping, we are repeatedly sampling with replacement from the original sample $X_1,X_2, \dots, X_n$. That is exactly the same as taking an iid sample from $\hat{F}_n$ defined above. In that way, bootstrapping can be seen as nonparametric maximum likelihood.

EDIT   (answer to question in comments by @Martijn Weterings)

If the model is $X_1, X_2, \dotsc, X_n$ IID from some distribution with cdf $F$, without any restrictions on $F$, then one can show that $\hat{F}_n(x)$ is the mle (maximum likelihood estimator) of $F(x)$. That is done in What inferential method produces the empirical CDF? so I will not repeat it here. Now, if $\theta$ is a real parameter describing some aspect of $F$, it can be written as a function $\theta(F)$. This is called a functional parameter. Some examples is $$ \DeclareMathOperator{\E}{\mathbb{E}} \E_F X=\int x \; dF(x)\quad (\text{The Stieltjes Integral}) \\ \text{median}_F X = F^{-1}(0.5) $$ and many others. The parameter space is $$\Theta =\left\{ F \colon \text{$F$ is a distribution function on the real line } \right\}$$

By the invariance property (Invariance property of maximum likelihood estimator?) we then find mle's by $$ \widehat{\E_F X} = \int x \; d\hat{F}_n(x) \\ \widehat{\text{median}_F X}= \hat{F}_n^{-1}(0.5). $$ It should be clearer now. We don't (as you ask about) use the empirical distribution function to define the likelihood, the likelihood function is completely nonparametric, and the $\hat{F}_n$ is the mle. The bootstrap is then used to describe the variability/uncertainty in mle's of $\theta(F)$'s of interest by resampling (which is simple random sampling from the $\hat{F}_n$.)

EDIT In the comment thread many seems to disbelieve this (which really is a standard result!) result. So trying to make it clearer. The likelihood function is nonparametric, the parameter is $F$, the unknown cumulative distribution function. For a given cutoff point in $\mathbb{R}$, a function of the parameter is $\DeclareMathOperator{\P}{\mathbb{P}} x(F)=F(x)=\P(X \le x)$. A corresponding transformation of the random variable $X$ is $I_x=\mathbb{I}(X\le x)$ which is a Bernoulli random variable with parameter $x(F)$. The maximum likelihood estimate of $x(F)$ based on the sample of $I_x(X_1), \dotsc, I_x(X_n)$ is the usual fraction of $X_i$'s that is lesser or equal to $x$, and the empirical cumulative distribution function expresses this simultaneously for all $x$. Hopes this is clearer now!

Related Question