Impose orthonormality constraints by method of Lagrange multipliers

I want to find the matrix $\Phi: \mathbb{R}^n \to \mathbb{R}^m$, $m<n$ that minimizes
$$V={\rm tr}(\Phi R \Phi^T)$$
subject to the orthonormality constraint
$$\Phi\Phi^T=I$$
where $R: \mathbb{R}^n \to \mathbb{R}^n$ is a given symmetric positive definite matrix.

How do I apply the Lagrange multiplier method to this constrained optimization problem?

I tried:
$$\tilde{V}={\rm tr}(\Phi R \Phi^T) – {\rm tr}\left(\Lambda(\Phi\Phi^T-I)\right)$$
where $\Lambda:\mathbb{R}^m \to \mathbb{R}^m$ is a general matrix of Lagrange multipliers, and this yields the necessary conditions for minimality:
$$\frac{\partial \tilde{V}}{\partial \Lambda}=\Phi\Phi^T-I=0$$
$$\frac{\partial \tilde{V}}{\partial \Phi}=R\Phi^T-\Phi^T\Lambda=0$$
The first equation is the orthonormality condition, so far so good. But how is the second equation supposed to find minimum candidates/critical points? Intuitively, I think I know that the global minimum argument $\Phi$ must be the basis of the eigenspace of the lowest $m$ eigenvalues of $R$. But $\Lambda$ is a general $m\times m$ matrix and nothing seems to constrain it to a diagonal matrix with $m$ eigenvalues (let alone the lowest) on the diagonal.

Or am I applying the Lagrange multiplier method the wrong way?

By the way: it seems easy for $m=1$. Then $\Phi^T$ simply becomes a vector and $\Lambda=\lambda$ becomes a scalar. This yields the eigenvalue problem for one eigenvector, which is 'diagonal' in the trivial sense:
$$R\Phi^T-\lambda\Phi^T=0$$

Best Answer

The Lagrange multiplier method does put a special constraint on the structure of $\Lambda$, but that isn't what you expected.

From $\Phi\Phi^T=I$ and $R\Phi^T=\Phi^T\Lambda$, we obtain $\Lambda=\Phi\Phi^T\Lambda=\Phi R\Phi^T$. The constraint placed on the structure of $\Lambda$ is not that $\Lambda$ must be diagonal, but that $\Lambda$ must be symmetric.

Since $\Lambda$ is symmetric, it can be orthogonally diagonalised as $QDQ^T$. Therefore, $R\Phi^T=\Phi^T\Lambda$ implies that $R(\Phi^TQ)=(\Phi^TQ)D$. The eigenvectors of $R$ are the columns of $\Phi^TQ$ rather than the columns of $\Phi^T$.

This makes sense if you look at the original objective function. Since $\Phi R\Phi^T$ has the same trace as $(Q^T\Phi) R(\Phi^TQ)$ for every $Q\in SO(m)$, there is no reason why the columns of any $\Phi^T$ that minimises $\operatorname{tr}(\Phi R\Phi^T)$ must be eigenvectors of $R$.

Best Answer

Related Solutions

[Math] Gradient descent versus Lagrange multipliers

Solve rotation matrix equation with orthonormality constraint by Lagrange multipliers

Related Question