[Math] Solve least-squares minimization from overdetermined system with orthonormal constraint

least squaresnon-convex-optimizationoptimizationprocrustes-problemqcqp

I would like to find the rectangular matrix $X \in \mathbb{R}^{n \times k}$ that solves the following minimization problem:

$$
\mathop{\text{minimize }}_{X \in \mathbb{R}^{n \times k}} \left\| A X – B \right\|_F^2 \quad \text{ subject to } X^T X = I_k
$$

where $A \in \mathbb{R}^{m \times n}$ and $B \in \mathbb{R}^{m \times k}$ are given. This appears to be a form of the orthogonal Procrustes problem, but I'm getting tripped up in my case when $X$ is not square and $n \gg k$ and $m > n$.

Optimistically, I'm looking for a solution that would involve singular value decomposition of a small $k \times k$ matrix, but I'm not seeing it. I'm especially interested in the case when $$A = \left(\begin{array}{c} D_1 \\ D_2\end{array}\right) \in \mathbb{R}^{2n \times n}$$ and $D_1,D_2$ are rank-sufficient diagonal matrices. This is to say that a solution involving $D_1^{-1}$ and $D_2^{-1}$ would be acceptable. The closest I've come (using "Thin SVD" on $Y$) is:

$$
Y = (A^TA)^{-1}(A^T B) \\
Y = U \Sigma V^T \\
X = UV^T
$$

clearly $X^T X = I_k$, but

I haven't convinced myself that this is the minimizer,
this involves inverting a potentially huge $n \times n$ matrix (perhaps unavoidable and not so bad in the stacked diagonal case above where $(A^TA)^{-1} = (D_1^2 + D_2^2)^{-1}$, and
this involves s.v.d. of a large rectangular $n \times k$ matrix.

Is this correct and as good as it gets? Or, is there a more efficient solution?

Best Answer

Your proposed solution is not correct. Let's consider the simplest case: $m=n$, $k=1$, and $A$ is invertible. Then our problem is $$\min_{x\in\mathbb R^n} \|Ax-b\|^2\quad\text{s.t.}\quad \|x\|^2=1.$$ The set $\{x:\|x\|^2=1\}$ is the unit sphere, so the transformed set $\{Ax:\|x\|^2=1\}$ is an ellipsoid, and we want to find the point $Ax$ on this ellipsoid closest to $b\in\mathbb R^n$.

Your proposed solution reduces to $y = A^{-1}b$ and $x = y/\|y\|$. Then $Ax = b/\|A^{-1}b\|$, that is, your proposed closest point is obtained by simply scaling $b$ to lie on the ellipsoid. It should be clear that in general this is not the closest point to $b$.

Sorry, I don't have a good answer for how to find the correct solution.

Related Solutions

Solve Large Scale Matrix Least Squares with Frobenius Regularization Problem efficiently

Here is a simple Julia script. If you translate it to another language beware of the nested loops. Julia handles these efficiently but they should be vectorized for Matlab or Python.

The first time the script is run it will create tab-separated-values (TSV) files for the $X$ and $W$ matrices. On subsequent runs, the script will read the TSV files, execute $k_{max}$ iterations, update the TSV files, and exit.

Thus you can intermittently refine the solution until you run out of patience.

#!/usr/bin/env  julia

#  Sequential Coordinate-wise algorithm for Non-Negative Least-Squares
#  as described on pages 10-11 of
#     http://users.wfu.edu/plemmons/papers/nonneg.pdf
#
#  Convergence is painfully slow, but unlike most other NNLS
#  algorithms the objective function is reduced at each step.
#
#  The algorithm described in the PDF was modified from its
#  original vector form:  |Ax - b|²
#    to the matrix form:  |LXKᵀ - M|²  +  λ|X|²
#
#  and to include the regularization term.

using LinearAlgebra, MAT, DelimitedFiles

function main()
  matfile = "problem.mat"
  Xfile   = "problem.mat.X.tsv"
  Wfile   = "problem.mat.W.tsv"

# read the matrices from the Matlab file
  f = matopen(matfile)
    K = read(f,"K1"); println("K: size = $(size(K)),\t rank = $(rank(K))")
    L = read(f,"K2"); println("L: size = $(size(L)),\t rank = $(rank(L))")
    M = read(f, "M"); println("M: size = $(size(M)),\t rank = $(rank(M))")
  # S = read(f,"S00");println("S: size = $(size(S)),\t rank = $(rank(S))")
  close(f)

  A = L'L
  B = K'K
  C = -L'M*K
  m,n = size(C)
  λ = 1/10     # regularization parameter
  kmax = 100   # maximum iterations


# specify the size of the work arrays
  X = 0*C
  W = 1*C
  H = A[:,1] * B[:,1]'

# resume from latest saved state ... or reset to initial conditions
  try
     X = readdlm(Xfile);  println("X: size = $(size(X)), extrema = $(extrema(X))")
     W = readdlm(Wfile);  println("W: size = $(size(W)), extrema = $(extrema(W))")
     println()
  catch
     @warn "Could not read the saved X,W matrices; re-initializing."
     X = 0*C
     W = 1*C
  end

  fxn = (norm(L*X*K' - M)^2 + λ*norm(X)^2) / 2
  println("at step 0, fxn = $fxn")

  k = 0
  while k < kmax
     for i = 1:m
         for j = 1:n
             mul!(H, A[:,i], B[:,j]')
             H[i,j] += λ
             δ = min( X[i,j], W[i,j]/H[i,j] )
             X[i,j] -= δ
             H .*= δ
             W .-= H
         end
     end
     k += 1
     fx2 = (norm(L*X*K' - M)^2 + λ*norm(X)^2) / 2
     println("after step $k, fxn = $fx2")

     # convergence check
     if fx2 ≈ fxn; break; end
     fxn = fx2
  end

# save the current state for the next run
  writedlm(Xfile, X)
  writedlm(Wfile, W)

# peek at the current solution
  println("\nsummary of current solution")
  println(" vector(X) = $(X[1:4]) ... $(X[end-3:end])")
  println("extrema(X) = $(extrema(X))")
end

# invoke the main function                                           
main()

Related Question