The log likelihood funciton is $$l(\theta_0)=\sum_{i=1}^n log(f(x_i)) \tag{1}$$ Since $\hat{\theta}$ is a solution of the maximum of log likelihood funtion $l(\theta_0)$ we know that $l'(\hat{\theta})=0$.

Next we do a Taylor expanson of $l'(\hat{\theta})$ around $\theta_0$

$$l'(\hat{\theta})=l'(\theta_0)+\frac{l''(\theta_0)}{1!}(\hat{\theta}-\theta_0)+\frac{l'''(\theta)}{2!}(\hat{\theta}-\theta_0)^2$$

Since $l'(\hat{\theta})=0$, we do some rearrangements here,

$$-l''(\theta_0)(\hat{\theta}-\theta_0)-\frac{l'''(\theta_0)}{2}(\hat{\theta}-\theta_0)^2=l'(\theta_0)$$
$$(\hat{\theta}-\theta_0)=\frac{l'(\theta_0)}{-l''(\theta_0)-\frac{l'''(\theta)}{2}(\hat{\theta}-\theta_0)}$$

We multiply $\sqrt{n}$ at both sides we get

$$\sqrt{n}(\hat{\theta}-\theta_0)=\frac{\frac{1}{\sqrt{n}}l'(\theta_0)}{-\frac{1}{n}l''(\theta_0)-\frac{l'''(\theta)}{2n}(\hat{\theta}-\theta_0)} \tag{2}$$

Next we need to show that $\frac{1}{\sqrt{n}} l'(\theta_0)$ has a $N(0,I(\theta_0))$ distribution.

From $(1)$ we get

$$l'(\theta_0)=\sum_{i=1}^n\frac{\partial log(f(x_i))}{\partial \theta_0}$$

We multiply $\frac{1}{\sqrt{n}}$ at both side.

$$\frac{1}{\sqrt{n}}l'(\theta_0)=\frac{1}{\sqrt{n}}\sum_{i=1}^n\frac{\partial log(f(x_i))}{\partial \theta_0} \tag{3}$$

Now we use CLT for the right hand side of $(3)$

We treat $\frac{\partial log(f(x_i))}{\partial \theta_0}$ as a random variable here.

And we can show $E(\frac{\partial log(f(x_i))}{\partial \theta_0})=0$ by following procedures:

$$1=\int_{-\infty}^{\infty}f(x)dx$$ take derivative of both sides

$$0=\int_{-\infty}^{\infty}\frac{\partial f(x)}{\partial \theta_0}dx=\int_{-\infty}^{\infty}\frac{\partial f(x)}{\partial \theta_0 f(x)}f(x)dx=\int_{-\infty}^{\infty}\frac{\partial log(f(x))}{\partial \theta_0}f(x)dx$$

Which shows that $E(\frac{\partial log(f(x_i))}{\partial \theta_0})=0$

We can show the variance of $\frac{\partial log(f(x_i))}{\partial \theta_0}$ is $I(\theta_0)$

Therefore, $$\frac{1}{\sqrt{n}}l'(\theta_0)\sim N(0,I(\theta_0))$$

We also can show that $-\frac{1}{n}l''(\theta_0)=I(\theta_0)$. I will not do detailed derivations here.

We also ignore the $-\frac{l'''(\theta)}{2}(\hat{\theta}-\theta_0)$ part in $(2)$

Now we wrap up $(2)$

$$\sqrt{n}(\hat{\theta}-\theta_0) \sim \frac{N(0,I(\theta_0))}{I(\theta_0)}=N(0,\frac{1}{I(\theta_0)})$$

By some rearrangements, you can see $\hat{\theta}$ also normally distributed.

## Best Answer

In chapter 2 of the Matrix Cookbook there is a nice review of matrix calculus stuff that gives a lot of useful identities that help with problems one would encounter doing probability and statistics, including rules to help differentiate the multivariate Gaussian likelihood.

If you have a random vector ${\boldsymbol y}$ that is multivariate normal with mean vector ${\boldsymbol \mu}$ and covariance matrix ${\boldsymbol \Sigma}$, then use equation (86) in the matrix cookbook to find that the gradient of the log likelihood ${\bf L}$ with respect to ${\boldsymbol \mu}$ is

$$\begin{align} \frac{ \partial {\bf L} }{ \partial {\boldsymbol \mu}} &= -\frac{1}{2} \left( \frac{\partial \left( {\boldsymbol y} - {\boldsymbol \mu} \right)' {\boldsymbol \Sigma}^{-1} \left( {\boldsymbol y} - {\boldsymbol \mu}\right) }{\partial {\boldsymbol \mu}} \right) \nonumber \\ &= -\frac{1}{2} \left( -2 {\boldsymbol \Sigma}^{-1} \left( {\boldsymbol y} - {\boldsymbol \mu}\right) \right) \nonumber \\ &= {\boldsymbol \Sigma}^{-1} \left( {\boldsymbol y} - {\boldsymbol \mu} \right) \end{align}$$

I'll leave it to you to differentiate this again and find the answer to be $-{\boldsymbol \Sigma}^{-1}$.

As "extra credit", use equations (57) and (61) to find that the gradient with respect to ${\boldsymbol \Sigma}$ is

$$ \begin{align} \frac{ \partial {\bf L} }{ \partial {\boldsymbol \Sigma}} &= -\frac{1}{2} \left( \frac{ \partial \log(|{\boldsymbol \Sigma}|)}{\partial{\boldsymbol \Sigma}} + \frac{\partial \left( {\boldsymbol y} - {\boldsymbol \mu}\right)' {\boldsymbol \Sigma}^{-1} \left( {\boldsymbol y}- {\boldsymbol \mu}\right) }{\partial {\boldsymbol \Sigma}} \right)\\ &= -\frac{1}{2} \left( {\boldsymbol \Sigma}^{-1} - {\boldsymbol \Sigma}^{-1} \left( {\boldsymbol y} - {\boldsymbol \mu} \right) \left( {\boldsymbol y} - {\boldsymbol \mu} \right)' {\boldsymbol \Sigma}^{-1} \right) \end{align} $$

I've left out a lot of the steps, but I made this derivation using only the identities found in the matrix cookbook, so I'll leave it to you to fill in the gaps.

I've used these score equations for maximum likelihood estimation, so I know they are correct :)