Suppose that $X = (x_{ij})n\times2$ follows a bivariate normal distribution $\mathcal{N}(\mu, \sigma^2I)$, where
$I$ is the $2\times 2$ identity matrix. How to find the maximum likelihood estimates of $\mu$ and $\sigma^2$? Specifically, how to deal with the determinant part in the density formula of bivariate normal distribution? Thanks!
[Math] Find the MLE of bivariate normal
normal distributionstatistics
Related Solutions
In your parametrization, the random variables $X_1$ and $X_2$ (where $X_i=[X_{i1},X_{i2}]$) are not correlated, therefore you can treat them independently (for each column of your matirx $X$), i.e. use the mean and sample variance as the estimators.
In additions: If you change your parametrization, and allow a full covariance matrix $\Sigma$ then you can use the following estimator:
$\Sigma=\frac{1}{n-1}\Sigma_{i=1}^n (X_i-\bar{X})((X_i-\bar{X}))^T$
where $X_i=[X_{i1},\ldots,X_{im}]^T$ is the $i$th column of matrix $X^T$ and $\bar{X}=\frac{1}{n}\Sigma_{i=1}^n X_i$ is your sample mean.
You can easily show that, this results in maximum likelihood estimation of you the mean and covariance, let start by the the likelihood function:
$f(X|\mu,\Sigma)=\frac{1}{\sqrt|\det(2\pi\Sigma)|^n}e^{\frac{-1}{2}\sum_i (X_i-\mu)^T\Sigma^{-1}(X_i-\mu)}$
Therefore:
$\log f(X|\mu,\Sigma)=\frac{-n}{2}\log(|\det(2\pi\Sigma)|)-\frac{1}{2}\sum_i (X_i-\mu)^T\Sigma^{-1}(X_i-\mu)$
$\log f(X|\mu,\Sigma)=-n\log(2\pi)-\frac{n}{2}\log(|\det(\Sigma)|)-\frac{1}{2}\sum_i (X_i-\mu)^T\Sigma^{-1}(X_i-\mu)$ (I)
$\Rightarrow \log f(X|\mu,\Sigma)=-n\log(2\pi)-\frac{n}{2}\log(|\det(\Sigma)|)-\frac{1}{2}\sum_i (X_i\Sigma^{-1} X_i^T-2\mu\Sigma^{-1} X_i^T+\mu\Sigma^{-1}\mu^T)$
$\Rightarrow \frac{\partial}{\partial \mu}\log f(X|\mu,\Sigma)=-\frac{1}{2}\sum_i (-2 \Sigma^{-1} X_i+2\Sigma\mu)=0$
$\Rightarrow \sum_i (-\Sigma^{-1} X_i+\Sigma^{-1}\mu)=0$, multiply both sides by $\Sigma$, you get:
$\sum_i X_i=n\mu$, therefore $\hat{\mu}_{MLE}=\frac{1}{n}\sum_i X_i$.
Edit: Now I add the derivation of MLE for $\Sigma$ here, start from (I):
$\log f(X|\mu,\Sigma)=-n\log(2\pi)-\frac{n}{2}\log(|\det(\Sigma)|)-\frac{1}{2}\sum_i (X_i-\mu)^T\Sigma^{-1}(X_i-\mu)$
W.l.g. assume $\Sigma$ is PD (not PSD, then we should use pseudo-inverse and pseudo-determinant), $\det(\Sigma)\geq 0$, therefore:
$\Rightarrow \log f(X|\mu,\Sigma)=-n\log(2\pi)-\frac{n}{2}\log(\det(\Sigma))-\frac{1}{2}\sum_i (X_i-\mu)^T\Sigma^{-1}(X_i-\mu)$
Note that, for $a,b \in R^k$, and $M \in R^{k\times k}$, $a^TMb=tr(a^TMb)=tr(ba^TM)$ ($tr()$ is the trace function and the last equality is by circularity of trace.)
$\Rightarrow \log f(X|\mu,\Sigma)=-n\log(2\pi)-\frac{n}{2}\log(\det(\Sigma))-\frac{1}{2}\sum_i tr((X_i-\mu)(X_i-\mu)^T\Sigma^{-1})$
We have that $\frac{\partial}{\partial\Sigma}\log(\det(\Sigma))=(\Sigma^{-1})^T$:
$\Rightarrow \frac{\partial}{\partial\Sigma}\log f(X|\mu,\Sigma)=-\frac{\partial}{\partial\Sigma}\frac{n}{2}\log(\det(\Sigma))-\frac{1}{2}\sum_i \frac{\partial}{\partial\Sigma}tr((X_i-\mu)(X_i-\mu)^T\Sigma^{-1})$
$\Rightarrow \frac{\partial}{\partial\Sigma}\log f(X|\mu,\Sigma)=-\frac{n}{2}(\Sigma^{-1})^T-\frac{1}{2}\sum_i \frac{\partial}{\partial\Sigma}tr((X_i-\mu)(X_i-\mu)^T\Sigma^{-1})$
With some abuse of notation: $\Rightarrow \frac{\partial}{\partial\Sigma}\log f(X|\mu,\Sigma)=-\frac{n}{2}(\Sigma^{-1})^T-\frac{1}{2}\sum_i \frac{1}{\partial\Sigma}tr((X_i-\mu)(X_i-\mu)^T\partial\Sigma^{-1})$
$\partial\Sigma^{-1}=-\Sigma^{-1}\partial\Sigma\Sigma^{-1}$, by substitution:
$\Rightarrow \frac{\partial}{\partial\Sigma}\log f(X|\mu,\Sigma)=-\frac{n}{2}(\Sigma^{-1})^T-\frac{1}{2}\sum_i \frac{1}{\partial\Sigma}tr((X_i-\mu)(X_i-\mu)^T(-\Sigma^{-1}\partial\Sigma\Sigma^{-1}))$
$=-\frac{n}{2}(\Sigma^{-1})^T+\frac{1}{2}\sum_i \frac{1}{\partial\Sigma}tr(\Sigma^{-1}(X_i-\mu)(X_i-\mu)^T\Sigma^{-1}\partial\Sigma)$
$=-\frac{n}{2}(\Sigma^{-1})^T+\frac{1}{2}\sum_i (\Sigma^{-1}(X_i-\mu)(X_i-\mu)^T\Sigma^{-1})^T$
$\Rightarrow \frac{\partial}{\partial\Sigma}\log f(X|\mu,\Sigma)=-\frac{n}{2}(\Sigma^{-1})^T+\frac{1}{2}\sum_i (\Sigma^{-1}(X_i-\mu)(X_i-\mu)^T\Sigma^{-1})^T=0$
$\frac{1}{2}\sum_i (\Sigma^{-1}(X_i-\mu)(X_i-\mu)^T\Sigma^{-1})^T=\frac{n}{2}(\Sigma^{-1})^T$
$\Rightarrow \sum_i (\Sigma^{-1}(X_i-\mu)(X_i-\mu)^T\Sigma^{-1})=n\Sigma^{-1}$
From left and right multiply by $\Sigma$:
$\Rightarrow \sum_i \Sigma(\Sigma^{-1}(X_i-\mu)(X_i-\mu)^T\Sigma^{-1})\Sigma=n\Sigma\Sigma^{-1}\Sigma$
$\Rightarrow \sum_i (X_i-\mu)(X_i-\mu)^T=n\Sigma$
$\Rightarrow \hat{\Sigma}_{MLE}=\frac{1}{n}\sum_i (X_i-\hat{\mu})(X_i-\hat{\mu})^T$
This is a biased estimator and you can fix it by using:
$\Rightarrow \hat{\Sigma}=\frac{1}{n-1}\sum_i (X_i-\hat{\mu})(X_i-\hat{\mu})^T$
instead, I hope this helps.
You have $$ \begin{bmatrix} Z_1 \\ Z_2 \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ \rho & \sqrt{1-\rho^2} \end{bmatrix} \begin{bmatrix} X \\ Y \end{bmatrix}. $$ The determinant of this matrix is $\sqrt{1-\rho^2}$.
You have the density $$ f_{X,Y}(x,y) = \frac{1}{2\pi} \exp\left( \frac{-1}{2}(x^2+y^2) \right) $$ and $$ \begin{bmatrix} 1 & 0 \\ \rho & \sqrt{1-\rho^2} \end{bmatrix}^{-1} = \begin{bmatrix} 1 & 0 \\ \frac{-\rho}{\sqrt{1-\rho^2}} & \frac{1}{\sqrt{1-\rho^2}} \end{bmatrix} $$ and the determinant of this matrix is $\sqrt{1-\rho^2}$.
That and your assertion about the density will give you the joint density of $W$ and $V$.
If you're looking for the correlation, you can read the covariance and the two variances out of the density function, but that should not be necessary. If you have two random variables $X,Y$ whose covariance matrix is $M$, and you've got $$ \begin{bmatrix} W \\ V \end{bmatrix} = A \begin{bmatrix} X \\ Y \end{bmatrix}, $$ then the covariance matrix of $\begin{bmatrix} W \\ V \end{bmatrix}$ is $$ AMA^T. $$ In this case that is $$ \begin{bmatrix} 1 & 0 \\ \rho & \sqrt{1-\rho^2} \end{bmatrix} \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & \rho \\ 0 & \sqrt{1-\rho^2} \end{bmatrix} = \begin{bmatrix} 1 & \rho \\ \rho & 1 \end{bmatrix}. $$ That gives you $\operatorname{cov}(W,V)$ and the two variances, and since both variances are $1$, the correlation is the covariance.
Best Answer
There is no need to worry about determinants here, since the off-diagonal entries in the variance are $0$ and the diagonal entries are all equal.
Generally, if $X\sim\operatorname{\mathcal N}_2(\mu, \Sigma)$ then variance $\Sigma$ is a $2\times2$ matrix $\Sigma = \left[ \begin{array}{cc} \sigma^2 & \rho\sigma\tau \\ \rho\sigma\tau & \tau^2 \end{array} \right]$ and $\det\Sigma\ne0,$ then the probability density is $$ \mathbf x \mapsto \frac 1 {2\pi}\cdot \frac 1 {(\det\Sigma)^{1/2}} \exp\left( -\tfrac1 2 (\mathbf x - \mu)^T \Sigma^{-1} (\mathbf x-\mu) \right) $$
But when $\sigma^2=\tau^2$ and $\rho=0$ then the two scalar components of $X$ are independent and identically distributed as $\mathcal N_1(\mu_1,\sigma^2).$ If you observe $n$ such vectors that are independent and all have that distribution, then you have observed $2n$ scalar random variables that are independent and all have the same univariate normal distribution.