# How to Take Derivative of Multivariate Normal Density

matrixnormal distributionself-study

Say I have multivariate normal $N(\mu, \Sigma)$ density. I want to get the second (partial) derivative w.r.t. $\mu$. Not sure how to take derivative of a matrix.

Wiki says take the derivative element by element inside the matrix.

I am working with Laplace approximation
$$\log{P}_{N}(\theta)=\log {P}_{N}-\frac{1}{2}{(\theta-\hat{\theta})}^{T}{\Sigma}^{-1}(\theta-\hat{\theta}) \>.$$
The mode is $\hat\theta=\mu$.

I was given $${\Sigma}^{-1}=-\frac{{{\partial }^{2}}}{\partial {{\theta }^{2}}}\log p(\hat{\theta }|y),$$ how did this come about?

What I have done:
$$\log P(\theta|y) = -\frac{k}{2} \log 2 \pi – \frac{1}{2} \log \left| \Sigma \right| – \frac{1}{2} {(\theta-\hat \theta)}^{T}{\Sigma}^{-1}(\theta-\hat\theta)$$

So, I take derivative w.r.t to $\theta$, first off, there is a transpose, secondly, it is a matrix. So, I am stuck.

Note: If my professor comes across this, I am referring to the lecture.

In chapter 2 of the Matrix Cookbook there is a nice review of matrix calculus stuff that gives a lot of useful identities that help with problems one would encounter doing probability and statistics, including rules to help differentiate the multivariate Gaussian likelihood.

If you have a random vector ${\boldsymbol y}$ that is multivariate normal with mean vector ${\boldsymbol \mu}$ and covariance matrix ${\boldsymbol \Sigma}$, then use equation (86) in the matrix cookbook to find that the gradient of the log likelihood ${\bf L}$ with respect to ${\boldsymbol \mu}$ is

\begin{align} \frac{ \partial {\bf L} }{ \partial {\boldsymbol \mu}} &= -\frac{1}{2} \left( \frac{\partial \left( {\boldsymbol y} - {\boldsymbol \mu} \right)' {\boldsymbol \Sigma}^{-1} \left( {\boldsymbol y} - {\boldsymbol \mu}\right) }{\partial {\boldsymbol \mu}} \right) \nonumber \\ &= -\frac{1}{2} \left( -2 {\boldsymbol \Sigma}^{-1} \left( {\boldsymbol y} - {\boldsymbol \mu}\right) \right) \nonumber \\ &= {\boldsymbol \Sigma}^{-1} \left( {\boldsymbol y} - {\boldsymbol \mu} \right) \end{align}

I'll leave it to you to differentiate this again and find the answer to be $-{\boldsymbol \Sigma}^{-1}$.

As "extra credit", use equations (57) and (61) to find that the gradient with respect to ${\boldsymbol \Sigma}$ is

\begin{align} \frac{ \partial {\bf L} }{ \partial {\boldsymbol \Sigma}} &= -\frac{1}{2} \left( \frac{ \partial \log(|{\boldsymbol \Sigma}|)}{\partial{\boldsymbol \Sigma}} + \frac{\partial \left( {\boldsymbol y} - {\boldsymbol \mu}\right)' {\boldsymbol \Sigma}^{-1} \left( {\boldsymbol y}- {\boldsymbol \mu}\right) }{\partial {\boldsymbol \Sigma}} \right)\\ &= -\frac{1}{2} \left( {\boldsymbol \Sigma}^{-1} - {\boldsymbol \Sigma}^{-1} \left( {\boldsymbol y} - {\boldsymbol \mu} \right) \left( {\boldsymbol y} - {\boldsymbol \mu} \right)' {\boldsymbol \Sigma}^{-1} \right) \end{align}

I've left out a lot of the steps, but I made this derivation using only the identities found in the matrix cookbook, so I'll leave it to you to fill in the gaps.

I've used these score equations for maximum likelihood estimation, so I know they are correct :)