“Consistency” vs. “Convergence” of Estimators : Are ALL “MLE’s” ALWAYS Consistent

I have heard the terms "Consistency" vs. "Convergence" being used interchangeably – for example:

In Machine Learning applications, I have heard the term "Convergence" describe a situation where successive differences in iterations becomes smaller than some threshold. For example, when we say that a "Machine Learning Model converged" – this means that the value of the Loss Function at "iteration n" versus the value at "iteration n+1" is almost identical.
In Statistics and Probability, I have heard a similar concept being used to describe the properties of estimators. For example, we can say that that the Maximum Likelihood Estimator of the "sample average" is "Consistent" with the "population average" – this means that as the number of observations in our sample get larger and larger, the value of the "sample average" will become closer and closer to the "population average". (I suppose we could say that as the number of samples become larger and larger, the value of the "sample average converges to the population average").

Regarding this, I had the following question:

When we consider the Maximum Likelihood Estimator for any Probability Distribution Function (e.g. "Mu-hat-MLE" from a Normal Distribution, "Lambda-hat-MLE" from an Exponential Distribution, "p-hat-MLE" from a Binomial Distribution, etc.) – do we know if ALL Maximum Likelihood Estimators (in theory) are ALWAYS "Consistent"?

For example, suppose we have some never-before-seen Probability Distribution Function, yet we somehow manage to maximize the corresponding likelihood function and obtain a maximum likelihood estimate for its parameters (e.g. "mu", "lambda",etc.) – by virtue of the fact that we have the MLE, will this MLE automatically be Consistent?

Or do we still have to prove that this MLE will be Consistent?

Thank you!

Best Answer

No, not all MLEs are consistent. See this for an example: Example of an inconsistent Maximum likelihood estimator. The example within considers a mixture of two Gaussians $N(0,1)$ and $N(\mu, \sigma^2)$ whose density is given by $$f(x, \theta) = \frac{1}{2}\phi(x) + \frac{1}{2 \sigma} \phi \left( \frac{x-\mu}{\sigma} \right)$$ where the parameter space is given by $\mu \in \mathbb{R}, \sigma \in \mathbb{R}^+$. One sees that the likelihood can be made arbitrarity large by taking $\hat{\mu} = X_1$ and $\hat{\sigma}$ arbitrarily small. More generally, if the MLE fails to be unique, one cannot expect consistency.

For general conditions that guarantee consistency of MLEs, you may consult Asymptotic Statistics by van der Vaart.

Finally, "consistency" and "convergence" are not interchangeable, as consistency considers what happens in the infinite data limit (i.e. $n \to \infty$), whereas convergence refers to the behaviour of an algorithm with a fixed amount of data (e.g. maximising the likelihood for a generalised linear model).

Best Answer

Related Solutions

[Math] Obtaining Consistent Estimators Based on Uniform Distribution

Consistency of maximum likelihood estimator

Related Question