Optimization – Trace Minimization with Constraints

linear algebramatricesoptimizationqcqp

For positive semi-definite matrices $A,B$, how can I find an $X$ that minimizes $\text{Trace}(AX^TBX$) under 'either' one of these constraints:

a) Sum of squares of Euclidean-distances between pairs of rows in $X$ is a constant $\nu$.

or

b) $X$ is orthogonal.

Not a hw problem- though the style of writing might suggest so.
All the matrices have real entries and $A,B$ are square while $X$ is rectangular. Thanks.

This is what I have:

Define $B=F^{T}F$. Define $Y=FX$. You get the above problem as
\begin{align}
\min_{Y}~ \text{trace}(AY^{T}Y)
\end{align}

But now I want $X^*$ that minimizes the original problem. This is what is confusing me!

Best Answer

If you want to solve the minimization problem individually in each of the two constrained cases, then (b) can be reduced to a long solved problem. [Edit: by "(b) $X$ is orthogonal", I suppose you mean $X$ has orthonormal columns. If the columns of $X$ are merely required to be orthogonal instead of orthonormal, then the minimum trace is clearly $0$, which is attained when $X=0$.]

Suppose $A$ is $m\times m$ and $B$ is $n\times n$ (hence $X$ is $n\times m$). We may assume that $A,B$ and $X$ have the same size, because if $X$ is "wide" ($n<m$), we can transform the problem to the form $\min\mathrm{tr}(AQ^T\tilde{B}Q)$ subject to $QQ^T=I$, where $A,\tilde{B}$ and $Q$ have the same sizes: \begin{align} AX^TBX &= A\underbrace{[X^T\ Y^T]}_{Q^T} \underbrace{\begin{bmatrix}B\\&0_{(m-n)\times(m-n)}\end{bmatrix}}_{\tilde{B}} \underbrace{\begin{bmatrix}X\\ Y\end{bmatrix}}_{Q} = AQ^T\tilde{B}Q. \end{align} The solution $X$ to the original problem will then be a submatrix of the solution $Q$ to the transformed problem, or more specifically, $$X=(I_n,\,0_{n\times(m-n)})Q.$$ A similar transformation can be performed if $X$ is "tall" and the size of $B$ is larger than the size of $A$.

Next, as $A$ and $B$ are positive semidefinite, we may orthogonally diagonalize them to their eigenvalue matrices. So, let $A=U\Lambda U^T$ and $B=V\Sigma V^T$, where the eigenvalues (i.e diagonal entries) of $\Lambda$ are arranged in ascending order and those of $B$ are arranged in descending order: \begin{align} \Lambda&=\mathrm{diag}(\lambda_1,\ldots,\lambda_n):=\mathrm{diag}(\lambda^\uparrow_1(A),\ldots,\lambda^\uparrow_n(A)),\\ \Sigma&=\mathrm{diag}(\sigma_1,\ldots,\sigma_n):=\mathrm{diag}(\lambda^\downarrow_1(B),\ldots,\lambda^\downarrow_n(B)). \end{align} Put $\tilde{Q}=V^TQU$, we get $$\min\mathrm{tr}(AQ^T\tilde{B}Q)=\min\mathrm{tr}(\Lambda\tilde{Q}^T\tilde{\Sigma}\tilde{Q}).$$ In other words, by absorbing $U$ and $V$ into $Q$, we may further assume that $A$ and $B$ are nonnegative diagonal matrices.

We have transformed our problem into a nicer form. It is now time to solve it. There are two more popular approaches to solve this problem. One looks more elegant and the other has a wider applicability.

The more elegant approach makes use of Birkhoff's Theorem. First, let $\tilde{Q}=(q_{ij})$. Then $\tilde{Q}\circ \tilde{Q}=(q_{ij}^2)$ is a doubly stochastic matrix. Therefore \begin{align} \min_{\tilde{Q}\tilde{Q}^T=I} \mathrm{tr}(\Lambda \tilde{Q}^T\Sigma\tilde{Q}) &=\min_{\tilde{Q}\tilde{Q}^T=I} \sum_{i,j}\sigma_i\lambda_jq_{ij}^2\\ &\ge\min_S \sum_{i,j}\sigma_i\lambda_js_{ij};\ S=(s_{ij}) \textrm{ is doubly stochastic}\,\ldots(\ast). \end{align} Observe that $\sum_{i,j}\sigma_i\lambda_js_{ij}$ is a linear function in $S$ and the set of all doubly stochastic matrices is compact and convex. Hence $\min\limits_S \sum_{i,j}\sigma_i\lambda_js_{ij}$ must occur at an extreme point of this convex set. However, by Birkhoff's Theorem, the extreme points of this convex set are exactly all the permutation matrices. And a permutation matrix, of course, is real orthogonal. Therefore equality holds in $(\ast)$ above and $\min\limits_{\tilde{Q}\tilde{Q}^T=I} \mathrm{tr}(\Lambda \tilde{Q}^T\Sigma\tilde{Q})$ is the minimum of $\mathrm{tr}(\Lambda P^T\Sigma P)$ over all permutation matrices $P$. As the diagonal entries of $\Lambda$ and $\Sigma$ are nonnegative and arranged in opposite orders, the global minimum occurs at $\tilde{Q}=P=I$, or $Q=VU^T$, which translates to $X=(I_n,\,0_{n\times(m-n)})VU^T$ when $X$ is wide.

The seemingly less elegant approach makes use of calculus. The major steps are as follows:

  1. Set first derivative of $\mathrm{tr}\ \Lambda \tilde{Q}^T \Sigma \tilde{Q}$ (with respect to $\tilde{Q}$, subject to $\tilde{Q}^T\tilde{Q}=I$) to zero gives $\mathrm{tr}(-\Lambda \tilde{Q}^TK\Sigma \tilde{Q} + \Lambda \tilde{Q}^T \Sigma K\tilde{Q})=0$ for all skew-symmetric matrix $K$. Hence $M=-\Sigma \tilde{Q}\Lambda \tilde{Q}^T + \tilde{Q}\Lambda \tilde{Q}^T \Sigma$ is a symmetric matrix.
  2. By definition, however, $M$ is skew-symmetric. Hence $M=0$, i.e. $\tilde{Q}\Lambda \tilde{Q}^T\Sigma$ is symmetric.
  3. Now, suppose the $2n$ diagonal entries in $\Lambda$ and $\Sigma$ are all nonzero and distinct. Since $\tilde{Q}\Lambda \tilde{Q}^T\Sigma$ is symmetric, $\tilde{Q}$ must be a permutation matrix. Hence we obtain the same result as in the previous approach. When the entries of $\Lambda$ and $\Sigma$ are not all distinct, we can consider two sequences of nonnegative diagonal matrices $\{\Lambda_n\}$ and $\{\Sigma_n\}$ such that $\Lambda_n\rightarrow\Lambda,\ \Sigma_n\rightarrow\Sigma$ as $n\rightarrow\infty$, and for each $n$ the $2n$ diagonal entries of $\Lambda_n$ and $\Sigma_n$ are nonzero and distinct. So, we may arrive at our desired conclusion by a continuity argument.

This second approach requires more work (because we have no nice theorem to depend on), but I think the technique involved (matrix calculus) is applicable to more problems and hence more useful.

Related Question