[Math] Minimize trace of inverse of convex combination of matrices.

convexityeigenvectormatricesoc.optimization-and-control

Hello! (First question–please forgive me if its unclear.)

I am interested in efficient/approximate optimization techniques for minimizing a norm of a convex combination of symmetric, positive semi-definite matrices. In particular I am wondering what techniques exist to efficiently solve the following problem:

$\alpha^{*}=\arg\min\big\{\operatorname{trace}((\sum_i\alpha_i \mathcal{I}_i)^{-1}):\sum_i\alpha_i=1, \alpha_i\ge0 \big\}$

where $\mathcal{I}_i$ are positive semi-definite matrices and $\operatorname{trace}$ is the sum of the eigenvalues. Edit: Furthermore, assume $\mathcal{I}_i\succ0$ for at least one $i$.

Approximations are acceptable–I am interested in any "standard" approaches, should they exist. Should some other scalar function be more convenient this would be interesting to me too (ie determinant, nuclear norm, spectral radius, etc). The matrices are often sparse and could be extremely large.

The simplest idea I had was to lower bound the sum of matrices using Weyl's theorem, ie, $\lambda_i(A+B)\ge \lambda_j(A)+\lambda_{n+i-j}(B)$ for $j\ge i$ giving me an upper bound to the trace of the inverse of the sum. I am wondering if another bound would be tighter or better still some clever variational approach exists.

The motivation (shouldn't be relevant to the answer) for this problem comes from statistical machine learning. I would like to develop techniques for characterizing maximum composite likelihood estimators (a generalization of the maximum likelihood estimator). Classical statistical analysis yields a convex combination of the expected Hessian matrices and is as above. (The $\mathcal{I}$ denotes the Fisher information matrix.)

There is some similarity to the Eigenvalues of Matrix Sums question although I believe this question is sufficiently different to ask anew.

Thanks,

Josh

Best Answer

Define,

$$A(\alpha) = \sum_i \alpha_i A_i.$$

In your case (ignoring constraints for the moment), you have

$$\min\quad\mbox{trace}(A(\alpha)^{-1}) = \sum_i e_i^TA(\alpha)^{-1}e_i,$$

which makes it somewhat tricky to optimize.

So, as per your request, here is a somewhat simpler setup that you may find useful. Consider,

Now, look at the following two problems:

$$\min_{\alpha}\quad \|A(\alpha)\|_2,$$

and

$$\min_{\alpha}\quad c^TA(\alpha)^{-1}c.$$

Both of these can be cast into SDPs. For the first, we replace it by $$\begin{eqnarray} \min_{\alpha,t}\quad &t\\\\ &\begin{bmatrix} tI & A(\alpha)\\\\ A(\alpha) & tI \end{bmatrix} \succeq 0 \end{eqnarray} $$

For the second, use a similar Schur-complement based idea to obtain $$\begin{eqnarray} \min_{\alpha,t}\quad &t\\\\ &\begin{bmatrix} A(\alpha) & c\\\\ c^T & t \end{bmatrix} \succeq 0 \end{eqnarray} $$

You can add more constraints on $\alpha$ if you want. However, because these things are SDPs, they won't be too easy to solve for really large-matrices.

Best Answer

Related Solutions

[Math] Bounds on $\|P^{k+1} – P^k\|$ for $n$ by $n$ stochastic matrix $P$ with trace $n-1$ and integer $k\gg n$

[Math] Generalizing inequality relating Euclidean distance & Frobenius norm to Bregman divergences such as relative entropy & von Neumann divergence

Related Question