Suppose that $X$ ($n$ by $2$ matrix) follows a bivariate normal distribution $N(\mu,\sigma^2I)$, where $I$ is the $2\times 2$ identity matrix. How to find the maximum likelihood estimates of $\mu$ and $\sigma^2$?
Statistics – MLE of Multivariate (Bivariate) Normal Distribution
normal distributionstatistics
Best Answer
In your parametrization, the random variables $X_1$ and $X_2$ (where $X_i=[X_{i1},X_{i2}]$) are not correlated, therefore you can treat them independently (for each column of your matirx $X$), i.e. use the mean and sample variance as the estimators.
In additions: If you change your parametrization, and allow a full covariance matrix $\Sigma$ then you can use the following estimator:
$\Sigma=\frac{1}{n-1}\Sigma_{i=1}^n (X_i-\bar{X})((X_i-\bar{X}))^T$
where $X_i=[X_{i1},\ldots,X_{im}]^T$ is the $i$th column of matrix $X^T$ and $\bar{X}=\frac{1}{n}\Sigma_{i=1}^n X_i$ is your sample mean.
You can easily show that, this results in maximum likelihood estimation of you the mean and covariance, let start by the the likelihood function:
$f(X|\mu,\Sigma)=\frac{1}{\sqrt|\det(2\pi\Sigma)|^n}e^{\frac{-1}{2}\sum_i (X_i-\mu)^T\Sigma^{-1}(X_i-\mu)}$
Therefore:
$\log f(X|\mu,\Sigma)=\frac{-n}{2}\log(|\det(2\pi\Sigma)|)-\frac{1}{2}\sum_i (X_i-\mu)^T\Sigma^{-1}(X_i-\mu)$
$\log f(X|\mu,\Sigma)=-n\log(2\pi)-\frac{n}{2}\log(|\det(\Sigma)|)-\frac{1}{2}\sum_i (X_i-\mu)^T\Sigma^{-1}(X_i-\mu)$ (I)
$\Rightarrow \log f(X|\mu,\Sigma)=-n\log(2\pi)-\frac{n}{2}\log(|\det(\Sigma)|)-\frac{1}{2}\sum_i (X_i\Sigma^{-1} X_i^T-2\mu\Sigma^{-1} X_i^T+\mu\Sigma^{-1}\mu^T)$
$\Rightarrow \frac{\partial}{\partial \mu}\log f(X|\mu,\Sigma)=-\frac{1}{2}\sum_i (-2 \Sigma^{-1} X_i+2\Sigma\mu)=0$
$\Rightarrow \sum_i (-\Sigma^{-1} X_i+\Sigma^{-1}\mu)=0$, multiply both sides by $\Sigma$, you get:
$\sum_i X_i=n\mu$, therefore $\hat{\mu}_{MLE}=\frac{1}{n}\sum_i X_i$.
Edit: Now I add the derivation of MLE for $\Sigma$ here, start from (I):
$\log f(X|\mu,\Sigma)=-n\log(2\pi)-\frac{n}{2}\log(|\det(\Sigma)|)-\frac{1}{2}\sum_i (X_i-\mu)^T\Sigma^{-1}(X_i-\mu)$
W.l.g. assume $\Sigma$ is PD (not PSD, then we should use pseudo-inverse and pseudo-determinant), $\det(\Sigma)\geq 0$, therefore:
$\Rightarrow \log f(X|\mu,\Sigma)=-n\log(2\pi)-\frac{n}{2}\log(\det(\Sigma))-\frac{1}{2}\sum_i (X_i-\mu)^T\Sigma^{-1}(X_i-\mu)$
Note that, for $a,b \in R^k$, and $M \in R^{k\times k}$, $a^TMb=tr(a^TMb)=tr(ba^TM)$ ($tr()$ is the trace function and the last equality is by circularity of trace.)
$\Rightarrow \log f(X|\mu,\Sigma)=-n\log(2\pi)-\frac{n}{2}\log(\det(\Sigma))-\frac{1}{2}\sum_i tr((X_i-\mu)(X_i-\mu)^T\Sigma^{-1})$
We have that $\frac{\partial}{\partial\Sigma}\log(\det(\Sigma))=(\Sigma^{-1})^T$:
$\Rightarrow \frac{\partial}{\partial\Sigma}\log f(X|\mu,\Sigma)=-\frac{\partial}{\partial\Sigma}\frac{n}{2}\log(\det(\Sigma))-\frac{1}{2}\sum_i \frac{\partial}{\partial\Sigma}tr((X_i-\mu)(X_i-\mu)^T\Sigma^{-1})$
$\Rightarrow \frac{\partial}{\partial\Sigma}\log f(X|\mu,\Sigma)=-\frac{n}{2}(\Sigma^{-1})^T-\frac{1}{2}\sum_i \frac{\partial}{\partial\Sigma}tr((X_i-\mu)(X_i-\mu)^T\Sigma^{-1})$
With some abuse of notation: $\Rightarrow \frac{\partial}{\partial\Sigma}\log f(X|\mu,\Sigma)=-\frac{n}{2}(\Sigma^{-1})^T-\frac{1}{2}\sum_i \frac{1}{\partial\Sigma}tr((X_i-\mu)(X_i-\mu)^T\partial\Sigma^{-1})$
$\partial\Sigma^{-1}=-\Sigma^{-1}\partial\Sigma\Sigma^{-1}$, by substitution:
$\Rightarrow \frac{\partial}{\partial\Sigma}\log f(X|\mu,\Sigma)=-\frac{n}{2}(\Sigma^{-1})^T-\frac{1}{2}\sum_i \frac{1}{\partial\Sigma}tr((X_i-\mu)(X_i-\mu)^T(-\Sigma^{-1}\partial\Sigma\Sigma^{-1}))$
$=-\frac{n}{2}(\Sigma^{-1})^T+\frac{1}{2}\sum_i \frac{1}{\partial\Sigma}tr(\Sigma^{-1}(X_i-\mu)(X_i-\mu)^T\Sigma^{-1}\partial\Sigma)$
$=-\frac{n}{2}(\Sigma^{-1})^T+\frac{1}{2}\sum_i (\Sigma^{-1}(X_i-\mu)(X_i-\mu)^T\Sigma^{-1})^T$
$\Rightarrow \frac{\partial}{\partial\Sigma}\log f(X|\mu,\Sigma)=-\frac{n}{2}(\Sigma^{-1})^T+\frac{1}{2}\sum_i (\Sigma^{-1}(X_i-\mu)(X_i-\mu)^T\Sigma^{-1})^T=0$
$\frac{1}{2}\sum_i (\Sigma^{-1}(X_i-\mu)(X_i-\mu)^T\Sigma^{-1})^T=\frac{n}{2}(\Sigma^{-1})^T$
$\Rightarrow \sum_i (\Sigma^{-1}(X_i-\mu)(X_i-\mu)^T\Sigma^{-1})=n\Sigma^{-1}$
From left and right multiply by $\Sigma$:
$\Rightarrow \sum_i \Sigma(\Sigma^{-1}(X_i-\mu)(X_i-\mu)^T\Sigma^{-1})\Sigma=n\Sigma\Sigma^{-1}\Sigma$
$\Rightarrow \sum_i (X_i-\mu)(X_i-\mu)^T=n\Sigma$
$\Rightarrow \hat{\Sigma}_{MLE}=\frac{1}{n}\sum_i (X_i-\hat{\mu})(X_i-\hat{\mu})^T$
This is a biased estimator and you can fix it by using:
$\Rightarrow \hat{\Sigma}=\frac{1}{n-1}\sum_i (X_i-\hat{\mu})(X_i-\hat{\mu})^T$
instead, I hope this helps.