Extending the trace maximization principle

eigenvalues-eigenvectorslinear algebramatricesoptimization

The Ky Fan's principle [1] states that, given a symmetric $n\times n$ real matrix $A$ with distinct eigenvalues and $p<n$:

$$\max_{X^\top X = I_p} \mathrm{Tr}(X^\top A X) = \max_{X^\top X = I_p} \sum_{i=1}^p x_i^\top A x_i = \sum_{i=1}^p \lambda_i(A)$$

with $\lambda_i(A)$ the $i$-th eigenvalue of $A$, sorted in decreasing order, and $x_i$ the $i$-th column of $X$. The maximum is reached for $x_i$ equal to the $i$-th eigenvector of $A$.

My question

I am interested in the following variant of the problem above:
$$\max_{X^\top X = I_p} \sum_{i=1}^p x_i^\top (A + \mu_i I)^2 x_i\,,$$
where the values of $\mu_i$ are distinct and non-zero.

I strongly suspect that the optimal value is reached for eigenvectors of $A$, however I cannot figure out how to prove it. For simplicity and without loss of generality, I assume that $A$ is a diagonal matrix.

What I have tried so far

  1. I have numerically checked that there is always seems to be a choice of eigenvectors of $A$ which achieves the maximum value. Taking a large number of random matrices $X$ and performing gradient descent over the space $X^\top X=I_p$ consistently yields values below (or equal to) that achieved by the best choice of eigenvectors.

  2. At the optimal point $X^*$, the gradient of the objective function is orthogonal to the tangent space of the constraint set $\{X^\top X=I_p\}$ at $X^*$ . Writing this necessary optimality condition gives

    a) $x_i^\top A x_j = 0$ for $i\neq j$

    b) $\bar x_j (A + \mu_i I)^2 x_i = 0$ for all $i,j$, with $(\bar x_j)$ denoting the orthogonal vectors completing the family $(x_i)$ into an orthogonal basis.

    However I am not sure that these two conditions are sufficient to get that the $x_i$'s are eigenvectors of $A$.

  3. The recent work of Liang et al. [2] explicitly solve the following closely related problem, which with my notations rewrites as:
    $$\max_{X^\top X = I_p} \sum_{i=1}^p \mu_i x_i^\top A x_i\,,$$
    They also find that the optimal value is reached at eigenvectors of $A$. Although the problem is very close, their (surprisingly simple) proof does not quite adapt to my case.

Any help would be appreciated!

[1] Fan, K. (1949). On a theorem of Weyl concerning eigenvalues of linear transformations I. Proceedings of the National Academy of Sciences of the United States of America, 35(11), 652. https://www.pnas.org/content/35/11/652

[2] Liang, X., Wang, L., Zhang, L. H., & Li, R. C. (2021). On Generalizing Trace Minimization. arXiv preprint arXiv:2104.00257. http://arxiv.org/abs/2104.00257

Best Answer

It is true. More generally, if $p\le n$ and $B_1,B_2,\ldots,B_p\in M_n(\mathbb R)$ form a commuting family of real symmetric matrices (in your case $B_j=(A+\mu_jI)^2$), then $$ \max_{X^\top X=I_p}\sum_{j=1}^px_j^\top B_jx_j $$ is attained at a certain set of common eigenvectors $x_1,x_2,\ldots,x_p$ of all the $B_j$s.

Since the $B_j$s commute with each other, they can be simultaneously orthogonally diagonalised. Therefore we may assume that $B_j=\operatorname{diag}(b_{j1},b_{j2},\ldots,b_{jn})$ for each $j$. Let $B\in M_n(\mathbb R)$ be the matrix whose $j$-th row is $(b_{j1},b_{j2},\ldots,b_{jn})$ for each $j\le p$ and whose other rows are zero. Complete $X$ to an $n\times n$ orthogonal matrix $Q$. The maximisation problem can then be rephrased as $$ \max_{X^\top X=I_p}\sum_{j=1}^p\sum_{i=1}^nb_{ji}x_{ij}^2 =\max_{Q^\top Q=I_n}\operatorname{tr}\left(B(Q\circ Q)\right). $$ Note that the Hadamard product $Q\circ Q$ is a doubly stochastic matrix. Therefore $$ \max_{Q^\top Q=I_n}\operatorname{tr}\left(B(Q\circ Q)\right) \le\max_{S \text{ is doubly stochastic}}\operatorname{tr}(BS).\tag{1} $$ Since the set of all doubly stochastic matrices is the convex hull of permutation matrices (Birkhoff-von Neumann theorem) and $\operatorname{tr}(BS)$ is linear in $S$, the maximum on the RHS of $(1)$ is attained at a permutation matrix $S$. Yet, such as $S$ is orthogonal and $S=S\circ S$. Therefore the LHS of $(1)$ is also attained at $Q=S$. Since each $B_j$ is assumed to be a diagonal matrix, the columns of $Q$ (which are the standard basis vectors) are eigenvectors of all $B_j$s. Hence the result follows because the columns of $X$ are taken from those of $Q$.

Related Question