Maximize $\log\det(I+X\Sigma)$

linear algebramatricesoptimization

\begin{array}{ll}\max\limits_{{X:\,\mathrm{tr}(X)\leq a\\}} \log\det(I+X\Sigma),\end{array}

where $\Sigma=\mathrm{diag}\{\sigma_1,\ldots,\sigma_n\}$, $\sigma_i\geq0$ and $a>0$ are given. And $X\geq 0$.

Here I am trying to maximize a concave function with a convex constraints.
Any hints on how the problem can be solved?

Best Answer

Define the constrained matrix variable $X$ in terms of an unconstrained matrix $Y$ as follows $$X = \frac{\alpha Y}{{\rm Tr}(Y)} \implies {\rm Tr}(X) = \alpha$$ The following variables will also be useful $$B = I+X\Sigma,\quad W^T=\Sigma B^{-1},\quad \tau={\rm Tr}(Y) = I:Y$$ Find the gradient of the function wrt $Y$ $$\eqalign{ \phi &= \log\det(B) \\ d\phi &= B^{-T}:dB \\ &= B^{-T}:dX\,\Sigma \\ &= W:dX \\ &= W:\Big({\alpha\tau^{-1}\,dY-\alpha\tau^{-2}Y(I:dY)}\Big) \\ &= \Big(\alpha\tau^{-1}W-(W:Y)\alpha\tau^{-2}I\Big):dY \\ \frac{\partial\phi}{\partial Y} &= \alpha\tau^{-1}W-(W:Y)\alpha\tau^{-2}I \\ }$$ Set the gradient to zero and solve, per usual for an unconstrained problem. $$\eqalign{ W &= (W:Y)\tau^{-1}I \;=\; \lambda I \\ \lambda I &= W = W^T = \Sigma B^{-1} = \Sigma(I+X\Sigma)^{-1} \\ \Sigma &= \lambda(I+X\Sigma) \\ X &= (\Sigma - \lambda I)(\lambda\Sigma)^{+} \;=\; \lambda^{-1}\Sigma\Sigma^{+}-\Sigma^{+} \\ }$$ where $\Sigma^+$ is the pseudoinverse of $\Sigma$.

All that remains is to find the value of $\lambda$, which can be calculated by enforcing the original trace constraint. $$\eqalign{ \alpha &= {\rm Tr}(X) \\ &= \lambda^{-1}{\rm Tr}(\Sigma\Sigma^{+}) - {\rm Tr}(\Sigma^{+}) \\ &= \lambda^{-1}{\rm rank}(\Sigma) - {\rm Tr}(\Sigma^{+}) \\ \lambda &= \frac{{\rm rank}(\Sigma)}{\alpha+{\rm Tr}(\Sigma^{+})} }$$ In the above derivation, the symmetry of $\Sigma$ was utilized, but the fact that it is diagonal was not needed. However, since $(\Sigma,\Sigma^+)$ are diagonal, $X$ is as well.

Finally, enforce non-negativity by taking $$X = (\lambda\Sigma)^{+}\max(0, \,\Sigma - \lambda I)$$

NB: In several steps, a colon is used to denote the trace/Frobenius product, i.e. $$A:B = {\rm Tr}(A^TB)$$