Solved – 4 cases of Maximum Likelihood Estimation of Gaussian distribution parameters

maximum likelihoodmeannormal distributionrandom variablevariance

Let $x_1,x_2,…,x_n$ some normally distributed observations. So
$\vec{x}=\begin{bmatrix}x_1 & x_2 & … & x_n\end{bmatrix}^{T}$
In the context of my research I am trying to estimate (using ML rule) the parameter
$\theta$ in the following 4 cases, where the distribution is Gaussian:

  1. Estimate $\theta=\hat{\mu}$ where $\sigma^2=unknown$
  2. Estimate $\theta=\hat{\sigma^2}$ where $\mu=0$
  3. Estimate $\theta=\hat{\sigma^2}$ where $\mu=\mu_0=known$
  4. Estimate $\vec{\theta}=\begin{bmatrix}\hat{\mu} & \hat{\sigma^2}\end{bmatrix}^{T}$ (both are unknown)

Starting from case 1:

The joint PDF is:
$f_{\vec{X}}(\vec{x};\ \theta)=f_{\vec{X}}(\vec{x};\ \hat{\mu})=\frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{{\left\lVert
\vec{x}-\vec{\mu}\right\rVert}^{2}}{2\sigma^2}}$ where $\vec{\mu}=\begin{bmatrix}\mu & \mu & … & \mu\end{bmatrix}^{T}$ (n $\times$ 1)
$=\prod_{i=1}^{n} f_{{X}}({x_i};\ \mu)$ since iid samples (also $f_{{X}}({x};\ \mu)=\frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{{(x-\mu)}^2}{2\sigma^2}}$)
$=\prod_{i=1}^{n} \frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{{(x_i-\mu)}^2}{2\sigma^2}}$
$=\frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{{(x_1-\mu)}^2}{2\sigma^2}}\cdot\frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{{(x_2-\mu)}^2}{2\sigma^2}}\cdot\ …\ \cdot\frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{{(x_n-\mu)}^2}{2\sigma^2}}$
$={(\frac{1}{\sqrt{2\pi}\sigma})}^ne^{-\frac{{(x_1-\mu)}^2}{2\sigma^2}-\frac{{(x_2-\mu)}^2}{2\sigma^2}-…-\frac{{(x_n-\mu)}^2}{2\sigma^2}}$
$={(\frac{1}{\sqrt{2\pi}\sigma})}^ne^{-\frac{1}{2\sigma^2}\sum_{i=1}^{n}{(x_1-\mu)}^2}$

So we need to maximize the exponent or equivalently to minimize the summation:\
${(x_1-\mu)}^2+{(x_2-\mu)}^2+…+{(x_n-\mu)}^2$
$=({x_1}^2-2x_1\mu+{\mu}^2)+({x_2}^2-2x_2\mu+{\mu}^2)+…+({x_n}^2-2x_n\mu+{\mu}^2)$
$=n{\mu}^2-2(x_1+x_2+…+x_n)\mu+({x_1}^2+{x_2}^2+…+{x_n}^2)$

In order to estimate $\mu$ that minimizes this expression let's note that:
$Ax^2+Bx+C$ and let $A>0$
$=A(x^2+\frac{B}{A}x)+C$
$=A(x^2+\frac{B}{A}x+\frac{B^2}{4A^2}-\frac{B^2}{4A^2})+C$
$=A(x^2+\frac{B}{A}x+\frac{B^2}{4A^2})-\frac{B^2}{4A}+C$
$=A(x+\frac{B}{2A})^2-\frac{B^2}{4A}+C$ which is minimum for $x=-\frac{B}{2A}$

or alternatively:

$\frac{d(Ax^2+Bx+C)}{dx}=0 \Leftrightarrow 2Ax+B=0 \Leftrightarrow x=\frac{-B}{2A}$
So
$$\theta=\hat{\mu}=\frac{2(x_1+x_2+…+x_n)}{2n}=\frac{(x_1+x_2+…+x_n)}{n}=\frac{1}{n}\sum_{i=1}^{n}x_i=\bar{x}$$

Let's proceed to the 4th case for which we estimate:
$\vec{\theta}= \underset{\begin{bmatrix}\hat{\mu} & \hat{\sigma^2}\end{bmatrix}^{T}} {\mathrm{argmax}}f_{\vec{X}}(\vec{x};\ \theta,{\sigma}^2)$

The joint PDF, like before, is:
$f_{\vec{X}}(\vec{x};\ \theta,{\sigma}^2)=…={(\frac{1}{\sqrt{2\pi}\sigma})}^ne^{-\frac{1}{2\sigma^2}\sum_{i=1}^{n}{(x_i-\mu)}^2}$

And the log likelihood function is:
$f_1=\ln[f_{\vec{X}}(\vec{x};\ \theta,{\sigma}^2)]\\
=\ln(2\pi)^{-n/2}+ln{(\sigma^2)}^{-n/2}-\frac{1}{2{\sigma}^2}\sum_{i=1}^{n}{(x_i-\mu)}^2\\
=-\frac{n}{2}\ln(2\pi)-\frac{n}{2}ln{(\sigma^2)}-\frac{1} {2{\sigma}^2}\sum_{i=1}^{n}{(x_i-\mu)}^2$

Let's maximize that:
$\frac{df_1}{d\mu}=\frac{2}{2\sigma^2}\sum_{i=1}^{n}{(x_i-\mu)}$

So:
$\frac{df_1}{d\mu}=0 \\\Leftrightarrow \frac{1}{\sigma^2}(\sum_{i=1}^{n}{x_i}-\sum_{i=1}^{n}{\mu})=0 \\\Leftrightarrow \sum_{i=1}^{n}{x_i}=\sum_{i=1}^{n}{\mu} \Leftrightarrow \sum_{i=1}^{n}{x_i}=n\mu \\\Leftrightarrow \hat{\mu}=\frac{1}{n}\sum_{i=1}^{n}{x_i}=\bar{x}$

Also:
$\frac{df_1}{d\sigma^2}=-\frac{n}{2\sigma^2}+\frac{1}{2\sigma^4}\sum_{i=1}^{n}{(x_i-\mu)}^2$

So:
$\frac{df_1}{d\sigma^2}=0 \\\\\Leftrightarrow \frac{1}{2\sigma^4}\sum_{i=1}^{n}{(x_i-\mu)}^2=\frac{n}{2\sigma^2} \\\\\Leftrightarrow \frac{1}{\sigma^2}\sum_{i=1}^{n}{(x_i-\mu)}^2=n \\\\\Leftrightarrow \hat{\sigma^2}=\frac{1}{n}\sum_{i=1}^{n}{(x_i-\mu)}^2 \\\\\Leftrightarrow \hat{\sigma^2}=\frac{1}{n}\sum_{i=1}^{n}{(x_i-\bar{x})}^2$

which leads to $$\vec{\theta}=\begin{bmatrix}\hat{\mu} & \hat{\sigma^2}\end{bmatrix}^{T}=… (as\ estimated\ above)$$

First, I want to ask if I am correct regarding these 2 cases. Also, I think that cases 1-3 are a specialization of case 4. If this is true then can I just replace the values of the $\mu$ and I would have covered cases 2 and 3. Is that correct or should I do something else?

Best Answer

Since case 2 is a special subset of case 3, consider $$X_1,\ldots,X_n\sim\mathcal N(\mu_0,\sigma^2)$$ The log-likelihood is then $$L(\sigma|x_1,\ldots,x_n)=\frac{-1}{2\sigma^2}\sum_{i=1}^n(x_i-\mu_0)^2 -n\log(\sigma)-n\log\sqrt{2\pi}$$ which is maximised for $$\hat\sigma^2=\frac{1}{n} \sum_{i=1}^n(x_i-\mu_0)^2$$