Confidence interval for mean based on MLE for normal distribution

confidence interval

My understanding is that if you are finding the maximum likelihood estimate of $\mu$ assuming the data came from a normal distribution, and then you want to find a confidence interval for the estimate, you end up with the estimate being $\bar{X}$ and $I(\theta)$ is $\frac{1}{\sigma^2}$, so the confidence interval is:

$$\bar{X}-z_{\alpha/2}\frac{\sigma}{\sqrt{n}}\leq \mu \leq \bar{X}+z_{\alpha/2}\frac{\sigma}{\sqrt{n}}$$

This is the same formula you use if you know the data is from a normal distribution but you don't know the variance and you want to calculate a confidence interval for the mean of the distribution.

But my question is, my understanding is that when the asymptotic variance from a MLE contains a population parameter like the population variance or a population parameter, you can simply replace it with the estimated parameter, because the estimated parameter converges to the actual parameter. In this case, I would naturally replace $\sigma$ with $S$. However, the basic formula for the confidence interval of a mean for a normal distribution with unknown variance is

$$\bar{X}-t_{n-1,\alpha/2}\frac{S}{\sqrt{n}}\leq \mu \leq \bar{X}+t_{n-1,\alpha/2}\frac{S}{\sqrt{n}}$$

So it uses a t-distribution, which is strange because I thought all you had to do was replace the unknown parameters in the asymptotic variance with estimates, and now I'm doubting what I thought. When constructing the asymptotic variance, can you replace any occurrences of $\sigma$ with $S$? Or a proportion $p$ with $\hat{p}$? And then how does this affect the confidence interval? Any help would be appreciated.

Best Answer

Replacing $\sigma$ with $S$ in your first equation is the result of inverting a Wald test. Referencing percentiles from the $t$-distribution is the result of inverting a $t$-test. If your data are indeed sampled from a normal distribution the $t$ interval will have exact coverage, while the Wald interval will be a very close approximation.

If the data generative process for $X_1,...,X_n$ is not normal but the sampling distribution of $\bar{X}$ is well approximated by a normal distribution (CLT), then both the $t$ interval and the Wald interval are good approximate solutions for constructing a confidence interval. In these settings the standard error is estimated but treated as known and not a function of the unknown fixed true $\mu$. This is the same as saying $\bar{X}\overset{\text{approx}}{\sim}N(\mu,\text{Var}[\bar{X}])$ and treating $\hat{\text{Var}}[\bar{X}]$ as the unknown true variance $\text{Var}[\bar{X}]$. This is all a matter of convenience. To improve the normal approximation one might include a link function such as a log link, i.e. $\text{log}\{\bar{X}\}\overset{\text{approx}}{\sim}N(\text{log}\{\mu\},\text{Var}[\text{log}\{\bar{X}\}])$ . More elaborate tests and confidence intervals exist such as the score and likelihood ratio that profile nuisance parameters (estimate nuisance parameters under the restricted null space and treat as known).

Here are some related threads: [1], [2], [3].