Solved – Proof of consistency of Maximum Likelihood Estimator(MLE)

consistencymaximum likelihoodproofself-study

I would appreciate some help comprehending a logical step in the proof below about the consistency of MLE. It comes directly from Introduction to Mathematical Statistics by Hogg and Craig and it is slightly different than the standard intuitive one that makes use of the Weak Law of Large Numbers.

So here goes:

Assume that $\hat{\theta_n}$ solves the estimating equation $\frac{\partial l(\theta)}{\partial \theta}=0$. We also assume the usual regularity conditions. Denote $\theta_0$ the true parameter which by assumption is an interior point of some set $\Omega$, Then $\hat{\theta_n} \xrightarrow{P} \theta_0$

Proof

Let $\mathbf{X}=(x_1,x_2, \ldots, {x_n})$, the vector of observations.
Since $\theta_0$ is an interior point in $\Omega$ , $(\theta_0 -a, \theta_0 +a) \subset \Omega $ for some $a >0$. Define $S_n$ to be the event

$$S_n= \{ \mathbf {X} : l(\theta_0 ; \mathbf{X}) > l(\theta_0 -a ; \mathbf{X}) \}
\cap \{ \mathbf{X}: l(\theta_0; \mathbf{X}) > l( \theta_0 +a ;\mathbf{X}) \} $$

(The log- likelihood is maximised on the true parameter, all good so far)

But on $S_n$, $l \left(\theta \right)$ has a local maximum, $\hat{\theta_n}$ such that $\theta_0-a<\hat{\theta_n}<\theta_0+a$ and $l^{\prime} \left(\hat{\theta_n} \right)=0$

That is

$$ S_n \subset \{ \mathbf{X}: | \hat{ \theta_{n} } \left( \mathbf{X} \right) -\theta_{0} | < a \} \cap \{ \mathbf{X}: l^{ \prime} \left( \hat{\theta_n} \left( \mathbf{X} \right) \right) =0 \} $$

It is precisely at this point that I find their proof a little obscure. How come they consider $S_n$ to be a subset of that other set? Their explanation is unclear. Of course the proof is not complete at this point but if I have this clarified, I can take it from there. Thank you in advance.

Best Answer

Comparing the question with the actual proof from the referred book, some subtle but important aspects have been left out from the former:
1) This part of the proof is about existence of a solution to the likelihood equation $\frac{\partial l(\theta)}{\partial \theta}=0$, that converges to the true parameter, and not about "consistency of the mle estimator".
2) The probability of $S_n$ tends to $1$. Then, by necessity, a $\hat \theta: \hat \theta \in (\theta_0 -a, \theta_0 +a)$ will exist for the $\mathbf X$ that forms the elements of $S_n$.

Then the proof states that as a consequence, $$S_n \subset \{ \mathbf{X}: | \hat{ \theta_{n} } \left( \mathbf{X} \right) -\theta_{0} | < a \} \cap \{ \mathbf{X}: l^{ \prime} \left( \hat{\theta_n} \left( \mathbf{X} \right) \right) =0 \}$$

What does the RHS-set intersection describe? It describes a data set for which a) $\hat \theta$ is a solution to the likelihood equation (2nd set) and
b) it is less than $a$-away from the true parameter $\theta_0$ (1st set). And it asserts that the data set of $S_n$ is a subset of this data set of the intersection.
And indeed it is, since $\hat \theta$ may satisfy the two conditions (being a solution to the likelihood equation and being less than $a$-away from the true parameter) for a data set larger than the data set that forms $S_n$ and which is characterized by a condition related to the value of the likelihood at the true parameter (unrelated to $\hat \theta$).
The proof then goes on to show that these imply that $\hat \theta$ will be less than $a$-away from the true parameter in probability, and then, that if $\hat \theta$ is unique, and so coincides with the MLE estimator, the latter is consistent.