[Math] Orthogonality constraints when formulating SVD as optimization problem

convex optimizationnonlinear optimizationoptimizationorthogonalitysvd

I am working on a project that requires me to formulate the singular value decomposition (SVD) as an optimization problem. In order to do this I need to require that $U$ and $V$ in $A =USV^T$ are orthogonal. Thus my problem looks something like this:

$$\arg\min_{U,\Sigma,V}\left\Vert A-U\Sigma V^{T}\right\Vert _{F}^{2}$$

subject to

$$ U^{T} U = I,\quad V^{T} V = I,\quad \mbox{diag}(\Sigma) \geq 0, \quad \mbox{rank}(U) = \mbox{rank}(V)=n $$

(As a side note I am not too worried about enforcing the size and rank of U and V, nor the sign of the singular values.)

I have no idea how to handle the orthogonality constraints. I am new to optimization. I have seen the paper A feasible method for optimization with orthogonality constraints. It says that orthogonality constraints are not convex, however someone I am working with assures me that the above problem is nonlinear convex.
I could attempt to use the transformation mentioned in the paper, but I have seen similar problems with orthogonality constraints that don't mention a Cayley transform, which makes me wonder if there are other methods.

Here is what I would like to figure out:

  • Is there a general way to handle the constraints $U^TU = I$ or even $\|u_i\| = 1$?

  • Can I formulate the orthogonality constraints as bilinear constraints?

  • Is there a known formulation of the SVD as an optimization problem? (along with solution algorithm ideally)

As of now I may try to solve for PCA as an optimization problem and use that to find the SVD of my centered dataset. I would prefer however to be able to find the SVD of the original dataset.

Best Answer

I found out that you can approach this problem using optimization over manifolds.

The easiest example to consider is solving the following eigenvalue problem using the Rayleigh quotient (only numerator is important): $$\max_{x\in \mathbb{R}^n} x^TAx$$ subject to: $$||x||=1$$ The maximum is the largest eigenvalue of A (it must be diagonalizable of course), and the value of $x$ that gives the maximum is the associated eigenvector. You can think of the unit length constraint on the vector $x$ as forcing it to lie on the surface of an (n-1)-sphere. This is easy to picture mentally if n=2 or 3.

In my question above the constraint $U^TU=I$ is equivalent to solving along a Grassmann manifold (or Stiefel manifold I believe) for the variable $U$. The same goes for $V$.

I found the following MATLAB package to perform such optimizations: https://manopt.org/index.html

See this blog post for an intro: https://afonsobandeira.wordpress.com/2015/03/16/optimizing-in-smooth-waters-optimization-on-manifolds/

Related Question