Solved – the intuitive (geometric?) meaning of minimizing the log determinant of a matrix

entropymatrixoptimization

I have come across optimization problems which seek a positive semi-definite matrix $A$ that minimizes some possibly non-convex function that includes the addition of $1/(\text{dimension}) * \log \det (A)$. For the context, consider this paper, equation I.1, which given a dataset $\mathcal X$ of $N$ $D$-variate observations, seeks a unit-trace, symmetric, positive-definite matrix $A$ minimizing

$$F(A) = \frac{1}{N}\sum_{x\in\mathcal X} \log\left(x^\prime A^{-1} x\right) + \frac{1}{D}\log\det(A).$$

The minimizer is an estimate of the covariance in some sense.

I understood that this might have something to do with differential entropy, a concept that is not clear to me. From the comments I got, I guess this was not correct.

Any help getting an intuition for this would be very helpful. I am very curious how usefulness of the minimizer of I.1 can be understood from this log determinant term. The first term is more or less clear to me. I was hoping to have a more general insight, but from whuber's comment perhaps the log determinant is not generally useful.

Best Answer

Short answer is: The determinant plays a role because it is tied to the jacobian of the multivariate change of variables and the logarithm is tied to taking the log-likelihood.

Long Answer: Let's start with the univariate standard normal density (parameter free) which is $$\frac{1}{\sqrt{2\pi}} \exp\left(-\frac{1}{2}t^2\right).$$ When we extend (parametrize) it to $x=\sigma t + \mu$, the change of variable requires $dt=\frac{1}{\sigma}dx$ making the general normal density $$\frac{1}{\sqrt{2\pi}} \frac{1}{\sigma}\exp\left(-\frac{1}{2}(\frac{x-\mu}{\sigma})^2\right).$$ Let us also work out the ML estimation of $\mu$ and $\sigma$ simultaneously (when both unknown). The log-likelihood is $$\text{A constant} - \frac{n}{2}\log(\sigma^2) -\frac{1}{2}\sum_{i=1}^n\left(\frac{x_i-\mu}{\sigma}\right)^2$$ maximization of which is equivalent to minimizing $$n \log(\sigma^2) + \sum_{i=1}^n\left(\frac{x_i-\mu}{\sigma}\right)^2$$ and both terms involving $\sigma^2$ need to be accounted for in the minimization (with respect to $\sigma^2$).

Multivariate (say number of dimensions = $d$) analogues behave the similar way. Starting with the generating (standard) density $$\left(\sqrt{2\pi}\right)^{-d} \exp\left(-\frac{1}{2}\mathbf{z}^t\mathbf{z}\right)$$ and the general MVN density is $$\left(\sqrt{2\pi}\right)^{-d} \left|\boldsymbol{\Sigma}\right|^{-1/2}\exp\left(-\frac{1}{2}\left(\mathbf{x}-\boldsymbol{\mu}\right)^t\boldsymbol{\Sigma}^{-1}\left(\mathbf{x}-\boldsymbol{\mu}\right)\right).$$

Observe that $\left|\boldsymbol{\Sigma}\right|^{-1/2}$ (which is the reciprocal of the square root of the determinant of the covariance matrix $\boldsymbol{\Sigma}$) in the multivariate case does what $1/\sigma$ does in the univariate case and $\boldsymbol{\Sigma}^{-1}$ does what $1/\sigma^2$ does in the univariate case. In simpler terms, $\left|\boldsymbol{\Sigma}\right|^{-1/2}$ is the change of variable "adjustment".

The maximization of likelihood would lead to minimizing (analogous to the univariate case) $$n \log\left|\boldsymbol{\Sigma}\right| + \sum_{i=1}^n\left(\mathbf{x}-\boldsymbol{\mu}\right)^t\boldsymbol{\Sigma}^{-1}\left(\mathbf{x}-\boldsymbol{\mu}\right)$$ Again, in simpler terms, $n \log\left|\boldsymbol{\Sigma}\right|$ takes the spot of $n \log(\sigma^2)$ which was there in the univariate case. These terms account for corresponding change of variable adjustments in each scenario.

Above is based on taking $\rho(x)=x$ as in the http://arxiv.org/pdf/1206.1386v2 language. Using $\rho(x)=\frac{d}{2}\log x$ (discussed after I.5 on p.2) changes things accordingly (although, as noted in the paper this $\rho(x)$ does not give a valid density).

Related Question