Linear Algebra – Minimize tr(AX) Where X is an Orthogonal Matrix

linear algebramatricesoptimization

Let $A, X\in\mathbb{R}^{n\times n}$. The scalar objective function is
$$J=\mathrm{tr}(AX)$$
If no constraints, let the derivative of $J$ with respect to $X$ be zeros, then we have
$$A=0$$
Suppose $A$ is also a complex function, from $A=0$ I can further calculate something.

My question is: what if $X$ is an orthogonal matrix, i.e., $X^TX=I$? Then it becomes an constrained optimization problem. Can I still use matrix differential techniques to derive the minimizer? Thanks.

Best Answer

The method of Lagrange multipliers yields the solution.

One studies the (real-valued) function $F(X)$ with the (matrix) constraint that $C(X)=I$ for $F(X)=\text{trace}(AX)$ and $C(X)=X^TX$, hence $$ F(X)=\sum\limits_{ij}A_{ji}X_{ij},\quad C_{ij}(X)=\sum\limits_{k}X_{ki}X_{kj}. $$ The gradients of $F$ and $C$ are given by $\partial_{ij}F(X)=A_{ji}$ and $$ \partial_{ij}C_{k\ell}(X)=X_{i\ell}[k=j]+X_{ik}[\ell=j]. $$ One wants that there exists some multipliers $\lambda_{ij}$ such that, for every $i$ and $j$, $$ \partial_{ij}F(X)=\sum\limits_{k\ell}\lambda_{k\ell}\partial_{ij}C_{k\ell}(X). $$ This condition reads $$ A_{ji}=\sum\limits_{k,\ell}\lambda_{k\ell}\left(X_{i\ell}[k=j]+X_{ik}[\ell=j]\right)=\sum\limits_{\ell}\lambda_{j\ell}X_{i\ell}+\sum\limits_{k}\lambda_{kj}X_{ik}, $$ or, equivalently, introducing the matrix $\Lambda=(\lambda_{ij})$, $$ A^T=X\Lambda^T+X\Lambda. $$ The matrix $M=\Lambda^T+\Lambda$ is such that $M^T=M$ and $XM=A^T$, hence $X^TX=I$ implies that $M$ should be such that $M^TM=M^2=AA^T$.

Using pseudo-inverse matrices if $A$ is not invertible and the usual definition of the square root of a symmetric matrix, one sees that the maximizing and minimizing matrices $X$ and the maximum and minimum values of $F$ are

When $A$ is invertible, one can use the usual definition of the square root of a symmetric matrix to find $M$. The maximizing and minimizing matrices $X$ and the maximum and minimum values of $F$ are$$ X_0=A^TM^{-1}=\pm A^T(AA^T)^{-1/2},\quad F(X_0)=\pm\mbox{trace}((AA^T)^{1/2}). $$ When $A$ is not invertible, the formula $X_0=A^TM^{-1}$, whichever meaning one gives to the notation $M^{-1}$, cannot yield an orthogonal matrix $X_0$ since, if $A$ is not invertible, neither are $A^T$, nor $A^T$ times any matrix. On the other hand, the formula $F(X_0)=\pm\mbox{trace}((AA^T)^{1/2})$ might still be valid.

Related Question