Statistics – MLE of Multivariate (Bivariate) Normal Distribution

normal distributionstatistics

Suppose that $X$ ($n$ by $2$ matrix) follows a bivariate normal distribution $N(\mu,\sigma^2I)$, where $I$ is the $2\times 2$ identity matrix. How to find the maximum likelihood estimates of $\mu$ and $\sigma^2$?

Best Answer

In your parametrization, the random variables $X_1$ and $X_2$ (where $X_i=[X_{i1},X_{i2}]$) are not correlated, therefore you can treat them independently (for each column of your matirx $X$), i.e. use the mean and sample variance as the estimators.

In additions: If you change your parametrization, and allow a full covariance matrix $\Sigma$ then you can use the following estimator:

$\Sigma=\frac{1}{n-1}\Sigma_{i=1}^n (X_i-\bar{X})((X_i-\bar{X}))^T$

where $X_i=[X_{i1},\ldots,X_{im}]^T$ is the $i$th column of matrix $X^T$ and $\bar{X}=\frac{1}{n}\Sigma_{i=1}^n X_i$ is your sample mean.

You can easily show that, this results in maximum likelihood estimation of you the mean and covariance, let start by the the likelihood function:

$f(X|\mu,\Sigma)=\frac{1}{\sqrt|\det(2\pi\Sigma)|^n}e^{\frac{-1}{2}\sum_i (X_i-\mu)^T\Sigma^{-1}(X_i-\mu)}$

Therefore:

$\log f(X|\mu,\Sigma)=\frac{-n}{2}\log(|\det(2\pi\Sigma)|)-\frac{1}{2}\sum_i (X_i-\mu)^T\Sigma^{-1}(X_i-\mu)$

$\log f(X|\mu,\Sigma)=-n\log(2\pi)-\frac{n}{2}\log(|\det(\Sigma)|)-\frac{1}{2}\sum_i (X_i-\mu)^T\Sigma^{-1}(X_i-\mu)$ (I)

$\Rightarrow \log f(X|\mu,\Sigma)=-n\log(2\pi)-\frac{n}{2}\log(|\det(\Sigma)|)-\frac{1}{2}\sum_i (X_i\Sigma^{-1} X_i^T-2\mu\Sigma^{-1} X_i^T+\mu\Sigma^{-1}\mu^T)$

$\Rightarrow \frac{\partial}{\partial \mu}\log f(X|\mu,\Sigma)=-\frac{1}{2}\sum_i (-2 \Sigma^{-1} X_i+2\Sigma\mu)=0$

$\Rightarrow \sum_i (-\Sigma^{-1} X_i+\Sigma^{-1}\mu)=0$, multiply both sides by $\Sigma$, you get:

$\sum_i X_i=n\mu$, therefore $\hat{\mu}_{MLE}=\frac{1}{n}\sum_i X_i$.

Edit: Now I add the derivation of MLE for $\Sigma$ here, start from (I):

$\log f(X|\mu,\Sigma)=-n\log(2\pi)-\frac{n}{2}\log(|\det(\Sigma)|)-\frac{1}{2}\sum_i (X_i-\mu)^T\Sigma^{-1}(X_i-\mu)$

W.l.g. assume $\Sigma$ is PD (not PSD, then we should use pseudo-inverse and pseudo-determinant), $\det(\Sigma)\geq 0$, therefore:

$\Rightarrow \log f(X|\mu,\Sigma)=-n\log(2\pi)-\frac{n}{2}\log(\det(\Sigma))-\frac{1}{2}\sum_i (X_i-\mu)^T\Sigma^{-1}(X_i-\mu)$

Note that, for $a,b \in R^k$, and $M \in R^{k\times k}$, $a^TMb=tr(a^TMb)=tr(ba^TM)$ ($tr()$ is the trace function and the last equality is by circularity of trace.)

$\Rightarrow \log f(X|\mu,\Sigma)=-n\log(2\pi)-\frac{n}{2}\log(\det(\Sigma))-\frac{1}{2}\sum_i tr((X_i-\mu)(X_i-\mu)^T\Sigma^{-1})$

We have that $\frac{\partial}{\partial\Sigma}\log(\det(\Sigma))=(\Sigma^{-1})^T$:

$\Rightarrow \frac{\partial}{\partial\Sigma}\log f(X|\mu,\Sigma)=-\frac{\partial}{\partial\Sigma}\frac{n}{2}\log(\det(\Sigma))-\frac{1}{2}\sum_i \frac{\partial}{\partial\Sigma}tr((X_i-\mu)(X_i-\mu)^T\Sigma^{-1})$

$\Rightarrow \frac{\partial}{\partial\Sigma}\log f(X|\mu,\Sigma)=-\frac{n}{2}(\Sigma^{-1})^T-\frac{1}{2}\sum_i \frac{\partial}{\partial\Sigma}tr((X_i-\mu)(X_i-\mu)^T\Sigma^{-1})$

With some abuse of notation: $\Rightarrow \frac{\partial}{\partial\Sigma}\log f(X|\mu,\Sigma)=-\frac{n}{2}(\Sigma^{-1})^T-\frac{1}{2}\sum_i \frac{1}{\partial\Sigma}tr((X_i-\mu)(X_i-\mu)^T\partial\Sigma^{-1})$

$\partial\Sigma^{-1}=-\Sigma^{-1}\partial\Sigma\Sigma^{-1}$, by substitution:

$\Rightarrow \frac{\partial}{\partial\Sigma}\log f(X|\mu,\Sigma)=-\frac{n}{2}(\Sigma^{-1})^T-\frac{1}{2}\sum_i \frac{1}{\partial\Sigma}tr((X_i-\mu)(X_i-\mu)^T(-\Sigma^{-1}\partial\Sigma\Sigma^{-1}))$

$=-\frac{n}{2}(\Sigma^{-1})^T+\frac{1}{2}\sum_i \frac{1}{\partial\Sigma}tr(\Sigma^{-1}(X_i-\mu)(X_i-\mu)^T\Sigma^{-1}\partial\Sigma)$

$=-\frac{n}{2}(\Sigma^{-1})^T+\frac{1}{2}\sum_i (\Sigma^{-1}(X_i-\mu)(X_i-\mu)^T\Sigma^{-1})^T$

$\Rightarrow \frac{\partial}{\partial\Sigma}\log f(X|\mu,\Sigma)=-\frac{n}{2}(\Sigma^{-1})^T+\frac{1}{2}\sum_i (\Sigma^{-1}(X_i-\mu)(X_i-\mu)^T\Sigma^{-1})^T=0$

$\frac{1}{2}\sum_i (\Sigma^{-1}(X_i-\mu)(X_i-\mu)^T\Sigma^{-1})^T=\frac{n}{2}(\Sigma^{-1})^T$

$\Rightarrow \sum_i (\Sigma^{-1}(X_i-\mu)(X_i-\mu)^T\Sigma^{-1})=n\Sigma^{-1}$

From left and right multiply by $\Sigma$:

$\Rightarrow \sum_i \Sigma(\Sigma^{-1}(X_i-\mu)(X_i-\mu)^T\Sigma^{-1})\Sigma=n\Sigma\Sigma^{-1}\Sigma$

$\Rightarrow \sum_i (X_i-\mu)(X_i-\mu)^T=n\Sigma$

$\Rightarrow \hat{\Sigma}_{MLE}=\frac{1}{n}\sum_i (X_i-\hat{\mu})(X_i-\hat{\mu})^T$

This is a biased estimator and you can fix it by using:

$\Rightarrow \hat{\Sigma}=\frac{1}{n-1}\sum_i (X_i-\hat{\mu})(X_i-\hat{\mu})^T$

instead, I hope this helps.

Related Question