Solved – Consistency and Asymptotic Normality for MLE of Independent NON-identically distributed normals

asymptoticsconsistencylikelihood-ratiomaximum likelihood

I have the following setting:
$$ x_k \sim N(\mu,\sigma^2 + \hat{\delta}^2_k),k=1,\dots,K, $$

where $\{x_k,k=1,\dots,K\}$ – observed data, $\{\hat{\delta}^2_k\,k=1,\dots,K \}$ are known parameters (just consider them fixed), while $(\mu,\sigma^2)$ are unknown. My primary goal – making inference about $\mu$. Does anyone know of any results for this particular case in terms of consistency and asymptotic behavior of MLE estimator for $\mu$ as $K\rightarrow \infty$?

For the time being I am simply using a $\chi^2$ approximation for the likelihood ratio test, but I don't have solid theoretical argument for that as the data is not identically distributed. It also complicates things that I fail to get a closed-form solution for $\hat{\mu}_{mle}$

I have tried Hoadley's paper on "Asymptotic Properties of MLE for Independent Non-Identically distributed case" http://projecteuclid.org/download/pdf_1/euclid.aoms/1177693066, but verifying their general conditions is not coming to me easy as of yet.

Please let me know if anyone encountered this particular kind of a problem and knows what regularity conditions are needed for nice classic MLE properties. Obviously there should be a certain upper bound on values $\{ \hat{\delta}^2_k,\ k=1,\dots, \}$, but what kind of bound, and even with that bound – how to prove consistency/asymptotic normality.

Best Answer

You can find much more accessible conditions for consistency and asymptotic normality of MLE in Hayashi's Econometrics, ch. 7.,in the general context of Extremum Estimators and its sub-class, the M-estimators. Hayashi has also references for detailed proofs on the conditions.

The MLE with independent observations belongs to this subclass, because it maximizes a "sample average", an average of a real-valued function of the data and the unknown parameters (note that with independent observations the log-likelihood of the sample is certainly a sum, and we can divide it by the sample size without affecting the solution).

So (in general notation)

$$\hat \theta_{MLE} = \text{argmax}_{\theta} \left\{\frac 1n \sum_{i=1}^n \ell_i(x_i;\theta)\right\}$$

where $\ell_i$ is the log-likelihood of observation $i$.

For consistency, there are two-three alternative sets of conditions. Common to all conditions are:
1) The parameters lie in the interior of the parameter space

2) $\ell_i(x_i;\theta)$ is measurable (if it is continuous, it is measurable)

3) The objective function $\frac 1n \sum_{i=1}^n \ell_i(x_i;\theta)$ converges in probability to some function, say $\ell_0(\theta)$

4) $\ell_0(\theta)$ is uniquely maximized at the true parameter vector (say $\theta_0$)

Then moreover:
1st Alternative : if the parameter space is compact, and convergence is uniform, we obtain consistency.

2nd Alternative : if the parameter space is not compact, then if the log-likelihood is concave and convergence is just pointwise, we again obtain consistency.

I ' ll leave asymptotic normality for the OP to look up and explore.

Related Question