Maximize $ \log {\rm det} (\boldsymbol{I} +\boldsymbol{X} \boldsymbol{A} \boldsymbol{X}^T)$

matricesnonlinear optimizationoptimization

I want to solve the following maximization problem in $\boldsymbol{X}={\rm diag}(x_1,\ldots, x_n)$

$$\begin{array}{ll} \text{maximize} & \log {\rm det} (\boldsymbol{I} +\boldsymbol{X} \boldsymbol{A} \boldsymbol{X}^T)\\ \text{subject to} & \mbox{tr}( \boldsymbol{X}) =1\\ & 0 \leq x_i \leq 1, \quad i = 1, \ldots, n\end{array}$$

where $\boldsymbol{A}$ is a given positive semidefinite matrix.

To understand it, I first suppose $n=1$, then the objective function would be $\log(1 + a x^2)$, which is non-convex. Does it mean the objective function is non-convex when $n \geq 1$. How to solve this problem? Any comments would be appreciated!

Best Answer

$\textbf{1. The theoretical method}$. Let $A$ be symmetric $\geq 0$ and $f:X\in \Delta\mapsto \log(\det(I+XAX))$ where $\Delta=\{X=diag(x_1,\cdots,x_n);x_i\in [0,1],\sum_i x_i=1\}$.

$\textbf{Proposition 1}$. If $f$ admits an extremum in $X=diag(x_i)$ where $x_i\in (0,1)$, then, the entries of the diagonal of the matrix $AXAdjoint(I+XAX)$ are equal.

$(*)$ Then, with the relation $\sum_i x_i=1$, one has $n$ relations linking the $n$ unknowns $x_i$.

$\textbf{Proof}$. We use the Lagrange method. There is $\lambda$ s.t., for every diagonal $H$,

$Df_X(H)+\lambda tr(H)=0$, that is,

$0=tr((HAX+XAH)(I+XAX)^{-1}+\lambda H)=$

$tr(H(AX(I+XAX)^{-1}+(I+XAX)^{-1}XA+\lambda I))$.

That implies that the entries of the diagonal of the symmetric matrix

$U+U^T=\dfrac{1}{\det(I+XAX)}(AXAdjoint(I+XAX)+Adjoint(I+XAX)XA)$

are equal to $-\lambda$. $\square$

EDIT. Unfortunately,

i) the system $(*)$ has many solutions (until $7$ real ones when $n=3$) and it's hard to get them all (when $n$ is large).

ii) There are instances where the required maximum is reached in a point $X$ s.t. some $x_i$'s are $0$.

$\textbf{2. Using software}$.

$\textbf{Proposition 2}$. $f$ is increasing wrt each $x_i$.

$\textbf{Sketch of the proof}$. Since $I+XAX$ is symmetric $>0$, the signum of $\dfrac{\partial f}{\partial x_1}(X)=Df_X(diag(1,0,\cdots,0))$ (cf. above) is the same as the signum of $(AXAdjoint(I+XAX))[1,1]$... $\square$

Thus we can replace $\Delta$ with $Z=\{X=diag(x_1,\cdots,x_n);x_i\in [0,1],\sum_i x_i\leq 1\}$. This allows more directions for the gradient method.

I use the software NLPSolve (by Maple); the NLPSolve command uses various methods implemented in a built-in library provided by the Numerical Algorithms Group (NAG).

I propose $2$ methods (Some tests seem to show that the first one is better).

i) Over $\Delta$, with initial point $1/n.I_n$.

ii) Over $Z$, with initial point $\dfrac{1}{kn}I_n$ where $k>1$.

$\textbf{Remark}$. The solutions are (very often) in the form

there are $p<n$ indices $i$ s.t. $x_i=0$ and the $n-p$ other $x_i$'s are close to $\dfrac{1}{n-p}$.

Related Question