Maximum Likelihood – How to Ensure Properties of Covariance Matrix When Fitting Multivariate Normal Model

covariancemaximum likelihoodoptimization

Suppose I have the following model

$$y_i=f(x_i,\theta)+\varepsilon_i$$

where $y_i\in \mathbb{R}^K$ , $x_i$ is a vector of explanatory variables, $\theta$ is the parameters of non-linear function $f$ and $\varepsilon_i\sim N(0,\Sigma)$, where $\Sigma$ naturally is $K\times K$ matrix.

The goal is the usual to estimate $\theta$ and $\Sigma$. The obvious choice is maximum likelihood method. Log-likelihood for this model (assuming we have a sample $(y_i,x_i),i=1,…,n$) looks like

$$l(\theta,\Sigma)=-\frac{n}{2}\log(2\pi)-\frac{n}{2} \log\det\Sigma-\sum_{i=1}^n(y_i-f(x_i,\theta))'\Sigma^{-1}(y-f(x_i,\theta)))$$

Now this seems simple, the log-likelihood is specified, put in data, and use some algorithm for non-linear optimisation. The problem is how to ensure that $\Sigma$ is positive definite. Using for example optim in R (or any other non-linear optimisation algorithm) will not guarantee me that $\Sigma$ is positive definite.

So the question is how to ensure that $\Sigma$ stays positive definite? I see two possible solutions:

  1. Reparametrise $\Sigma$ as $RR'$ where $R$ is upper-triangular or symmetric matrix. Then $\Sigma$ will always be positive-definite and $R$ can be unconstrained.

  2. Use profile likelihood. Derive the formulas for $\hat\theta(\Sigma)$ and $\hat{\Sigma}(\theta)$. Start with some $\theta_0$ and iterate $\hat{\Sigma}_j=\hat\Sigma(\hat\theta_{j-1})$, $\hat{\theta}_j=\hat\theta(\hat\Sigma_{j-1})$ until convergence.

Is there some other way and what about these 2 approaches, will they work, are they standard? This seems pretty standard problem, but quick search did not give me any pointers. I know that Bayesian estimation would be also possible, but for the moment I would not want to engage in it.

Best Answer

As it turns out you can use profile maximum likelihood to ensure the necessary properties. You can prove that for given $\hat\theta$, $l(\hat\theta,\Sigma)$ is maximised by

$$\hat\Sigma=\frac{1}{n}\sum_{i=1}^n\hat{\varepsilon}_i\hat{\varepsilon}_i',$$

where

$$\hat{\varepsilon}_i=y_i-f(x_i,\hat\theta)$$

Then it is possible to show that

$$\sum_{i=1}^n(y_i-f(x_i,\hat\theta))'\hat\Sigma^{-1}(y-f(x_i,\hat\theta)))=const,$$

hence we only need to maximise

$$l_R(\theta,\Sigma)=-\frac{n}{2} \log\det\hat\Sigma.$$

Naturally in this case $\Sigma$ will satisfy all the necessary properties. The proofs are identical for the case when $f$ is linear which can be found in Time Series Analysis by J. D. Hamilton page 295, hence I omitted them.