[Math] Find the MLE of bivariate normal

normal distributionstatistics

Suppose that $X = (x_{ij})n\times2$ follows a bivariate normal distribution $\mathcal{N}(\mu, \sigma^2I)$, where
$I$ is the $2\times 2$ identity matrix. How to find the maximum likelihood estimates of $\mu$ and $\sigma^2$? Specifically, how to deal with the determinant part in the density formula of bivariate normal distribution? Thanks!

Best Answer

There is no need to worry about determinants here, since the off-diagonal entries in the variance are $0$ and the diagonal entries are all equal.

Generally, if $X\sim\operatorname{\mathcal N}_2(\mu, \Sigma)$ then variance $\Sigma$ is a $2\times2$ matrix $\Sigma = \left[ \begin{array}{cc} \sigma^2 & \rho\sigma\tau \\ \rho\sigma\tau & \tau^2 \end{array} \right]$ and $\det\Sigma\ne0,$ then the probability density is $$ \mathbf x \mapsto \frac 1 {2\pi}\cdot \frac 1 {(\det\Sigma)^{1/2}} \exp\left( -\tfrac1 2 (\mathbf x - \mu)^T \Sigma^{-1} (\mathbf x-\mu) \right) $$

But when $\sigma^2=\tau^2$ and $\rho=0$ then the two scalar components of $X$ are independent and identically distributed as $\mathcal N_1(\mu_1,\sigma^2).$ If you observe $n$ such vectors that are independent and all have that distribution, then you have observed $2n$ scalar random variables that are independent and all have the same univariate normal distribution.

Related Solutions

Statistics – MLE of Multivariate (Bivariate) Normal Distribution

In your parametrization, the random variables $X_1$ and $X_2$ (where $X_i=[X_{i1},X_{i2}]$) are not correlated, therefore you can treat them independently (for each column of your matirx $X$), i.e. use the mean and sample variance as the estimators.

In additions: If you change your parametrization, and allow a full covariance matrix $\Sigma$ then you can use the following estimator:

$\Sigma=\frac{1}{n-1}\Sigma_{i=1}^n (X_i-\bar{X})((X_i-\bar{X}))^T$

where $X_i=[X_{i1},\ldots,X_{im}]^T$ is the $i$th column of matrix $X^T$ and $\bar{X}=\frac{1}{n}\Sigma_{i=1}^n X_i$ is your sample mean.

You can easily show that, this results in maximum likelihood estimation of you the mean and covariance, let start by the the likelihood function:

$f(X|\mu,\Sigma)=\frac{1}{\sqrt|\det(2\pi\Sigma)|^n}e^{\frac{-1}{2}\sum_i (X_i-\mu)^T\Sigma^{-1}(X_i-\mu)}$

Therefore:

$\log f(X|\mu,\Sigma)=\frac{-n}{2}\log(|\det(2\pi\Sigma)|)-\frac{1}{2}\sum_i (X_i-\mu)^T\Sigma^{-1}(X_i-\mu)$

$\log f(X|\mu,\Sigma)=-n\log(2\pi)-\frac{n}{2}\log(|\det(\Sigma)|)-\frac{1}{2}\sum_i (X_i-\mu)^T\Sigma^{-1}(X_i-\mu)$ (I)

$\Rightarrow \log f(X|\mu,\Sigma)=-n\log(2\pi)-\frac{n}{2}\log(|\det(\Sigma)|)-\frac{1}{2}\sum_i (X_i\Sigma^{-1} X_i^T-2\mu\Sigma^{-1} X_i^T+\mu\Sigma^{-1}\mu^T)$

$\Rightarrow \frac{\partial}{\partial \mu}\log f(X|\mu,\Sigma)=-\frac{1}{2}\sum_i (-2 \Sigma^{-1} X_i+2\Sigma\mu)=0$

$\Rightarrow \sum_i (-\Sigma^{-1} X_i+\Sigma^{-1}\mu)=0$, multiply both sides by $\Sigma$, you get:

$\sum_i X_i=n\mu$, therefore $\hat{\mu}_{MLE}=\frac{1}{n}\sum_i X_i$.

Edit: Now I add the derivation of MLE for $\Sigma$ here, start from (I):

$\log f(X|\mu,\Sigma)=-n\log(2\pi)-\frac{n}{2}\log(|\det(\Sigma)|)-\frac{1}{2}\sum_i (X_i-\mu)^T\Sigma^{-1}(X_i-\mu)$

W.l.g. assume $\Sigma$ is PD (not PSD, then we should use pseudo-inverse and pseudo-determinant), $\det(\Sigma)\geq 0$, therefore:

$\Rightarrow \log f(X|\mu,\Sigma)=-n\log(2\pi)-\frac{n}{2}\log(\det(\Sigma))-\frac{1}{2}\sum_i (X_i-\mu)^T\Sigma^{-1}(X_i-\mu)$

Note that, for $a,b \in R^k$, and $M \in R^{k\times k}$, $a^TMb=tr(a^TMb)=tr(ba^TM)$ ($tr()$ is the trace function and the last equality is by circularity of trace.)

$\Rightarrow \log f(X|\mu,\Sigma)=-n\log(2\pi)-\frac{n}{2}\log(\det(\Sigma))-\frac{1}{2}\sum_i tr((X_i-\mu)(X_i-\mu)^T\Sigma^{-1})$

We have that $\frac{\partial}{\partial\Sigma}\log(\det(\Sigma))=(\Sigma^{-1})^T$:

$\Rightarrow \frac{\partial}{\partial\Sigma}\log f(X|\mu,\Sigma)=-\frac{\partial}{\partial\Sigma}\frac{n}{2}\log(\det(\Sigma))-\frac{1}{2}\sum_i \frac{\partial}{\partial\Sigma}tr((X_i-\mu)(X_i-\mu)^T\Sigma^{-1})$

$\Rightarrow \frac{\partial}{\partial\Sigma}\log f(X|\mu,\Sigma)=-\frac{n}{2}(\Sigma^{-1})^T-\frac{1}{2}\sum_i \frac{\partial}{\partial\Sigma}tr((X_i-\mu)(X_i-\mu)^T\Sigma^{-1})$

With some abuse of notation: $\Rightarrow \frac{\partial}{\partial\Sigma}\log f(X|\mu,\Sigma)=-\frac{n}{2}(\Sigma^{-1})^T-\frac{1}{2}\sum_i \frac{1}{\partial\Sigma}tr((X_i-\mu)(X_i-\mu)^T\partial\Sigma^{-1})$

$\partial\Sigma^{-1}=-\Sigma^{-1}\partial\Sigma\Sigma^{-1}$, by substitution:

$\Rightarrow \frac{\partial}{\partial\Sigma}\log f(X|\mu,\Sigma)=-\frac{n}{2}(\Sigma^{-1})^T-\frac{1}{2}\sum_i \frac{1}{\partial\Sigma}tr((X_i-\mu)(X_i-\mu)^T(-\Sigma^{-1}\partial\Sigma\Sigma^{-1}))$

$=-\frac{n}{2}(\Sigma^{-1})^T+\frac{1}{2}\sum_i \frac{1}{\partial\Sigma}tr(\Sigma^{-1}(X_i-\mu)(X_i-\mu)^T\Sigma^{-1}\partial\Sigma)$

$=-\frac{n}{2}(\Sigma^{-1})^T+\frac{1}{2}\sum_i (\Sigma^{-1}(X_i-\mu)(X_i-\mu)^T\Sigma^{-1})^T$

$\Rightarrow \frac{\partial}{\partial\Sigma}\log f(X|\mu,\Sigma)=-\frac{n}{2}(\Sigma^{-1})^T+\frac{1}{2}\sum_i (\Sigma^{-1}(X_i-\mu)(X_i-\mu)^T\Sigma^{-1})^T=0$

$\frac{1}{2}\sum_i (\Sigma^{-1}(X_i-\mu)(X_i-\mu)^T\Sigma^{-1})^T=\frac{n}{2}(\Sigma^{-1})^T$

$\Rightarrow \sum_i (\Sigma^{-1}(X_i-\mu)(X_i-\mu)^T\Sigma^{-1})=n\Sigma^{-1}$

From left and right multiply by $\Sigma$:

$\Rightarrow \sum_i \Sigma(\Sigma^{-1}(X_i-\mu)(X_i-\mu)^T\Sigma^{-1})\Sigma=n\Sigma\Sigma^{-1}\Sigma$

$\Rightarrow \sum_i (X_i-\mu)(X_i-\mu)^T=n\Sigma$

$\Rightarrow \hat{\Sigma}_{MLE}=\frac{1}{n}\sum_i (X_i-\hat{\mu})(X_i-\hat{\mu})^T$

This is a biased estimator and you can fix it by using:

$\Rightarrow \hat{\Sigma}=\frac{1}{n-1}\sum_i (X_i-\hat{\mu})(X_i-\hat{\mu})^T$

instead, I hope this helps.

[Math] Bivariate normal distribution question

You have $$ \begin{bmatrix} Z_1 \\ Z_2 \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ \rho & \sqrt{1-\rho^2} \end{bmatrix} \begin{bmatrix} X \\ Y \end{bmatrix}. $$ The determinant of this matrix is $\sqrt{1-\rho^2}$.

You have the density $$ f_{X,Y}(x,y) = \frac{1}{2\pi} \exp\left( \frac{-1}{2}(x^2+y^2) \right) $$ and $$ \begin{bmatrix} 1 & 0 \\ \rho & \sqrt{1-\rho^2} \end{bmatrix}^{-1} = \begin{bmatrix} 1 & 0 \\ \frac{-\rho}{\sqrt{1-\rho^2}} & \frac{1}{\sqrt{1-\rho^2}} \end{bmatrix} $$ and the determinant of this matrix is $\sqrt{1-\rho^2}$.

That and your assertion about the density will give you the joint density of $W$ and $V$.

If you're looking for the correlation, you can read the covariance and the two variances out of the density function, but that should not be necessary. If you have two random variables $X,Y$ whose covariance matrix is $M$, and you've got $$ \begin{bmatrix} W \\ V \end{bmatrix} = A \begin{bmatrix} X \\ Y \end{bmatrix}, $$ then the covariance matrix of $\begin{bmatrix} W \\ V \end{bmatrix}$ is $$ AMA^T. $$ In this case that is $$ \begin{bmatrix} 1 & 0 \\ \rho & \sqrt{1-\rho^2} \end{bmatrix} \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & \rho \\ 0 & \sqrt{1-\rho^2} \end{bmatrix} = \begin{bmatrix} 1 & \rho \\ \rho & 1 \end{bmatrix}. $$ That gives you $\operatorname{cov}(W,V)$ and the two variances, and since both variances are $1$, the correlation is the covariance.

Best Answer

Related Solutions

Statistics – MLE of Multivariate (Bivariate) Normal Distribution

[Math] Bivariate normal distribution question

Related Question