Solve Large Scale Matrix Least Squares with Frobenius Regularization Problem efficiently

convex optimizationleast squaresoptimization

How to solve the following minimization problem: $$\min_{S>0}{F(\mathbf{S}) }= \frac{1}{2}\Vert \mathbf{M} – \mathbf{K_2SK_1^T}\Vert _F^2+\frac{1}{20}\Vert\mathbf{S}\Vert_F^2$$
where $\mathbf{S}\in R^{256 \times 256}$ with nonegative elements, $\mathbf{M}\in R^{n \times m}$, $\mathbf{K_2} \in R^{n \times 256}$, $\mathbf{K_1} \in R^{m \times 256}$. In most cases $3500\lt m \lt 18000$, $8 \lt n \lt 128$.

The data of a minimal case can be downloaded here. In this case $m=3788$, $n=16$. The following code help to load the data into workspace:

MATLAB

load('problem.mat')

Python

import scipy.io
data = scipy.io.loadmat('/home/ubuntu/MATLAB/problem.mat')
K1 = data['K1']
K2 = data['K2']
M = data['M']
S_inital_guess = data['S00']

What I've tried

  1. Vectorize the problem using $\mathbf{K}=kron(\mathbf{K_2},\mathbf{K_1})$.
    But $\mathbf{K}$ is too large for ordinary PC. And any optimization strategy using hessian matrix would produce more larger matrice.

  2. Solving the matrix-form problem directly which produce a 4-order Hessian tesnsor.
    Without hession, the algorithm(steepest descent with exact/inexact line search) converges too slowly.

  3. CVXPY – out of memory

    n = 256

    X = cp.Variable((n,n))

    constraints = [X>=0]

    gamma = cp.Parameter(nonneg=True, value=1)

    obj = cp.Minimize(cp.norm(K2 @ X @ K1.transpose() – M,'fro') + gamma*cp.norm(X,'fro')**2)

    prob = cp.Problem(obj,constraints)

    prob.solve(verbose=True)

How to solve it?

How to solve this large scale minimization problem efficiently? Could you please give me some code (python or matlab) snippet to solve the attach problem? Are there any out-of-box toolboxes I could use?

For Algorithm Testing

I've added a new mat file containing $K_1$,$K_2$,$M$ and a right answer $Xtrue$ for testing. All matrix are much smaller than the original problem in this file.

Best Answer

Here is a simple Julia script. If you translate it to another language beware of the nested loops. Julia handles these efficiently but they should be vectorized for Matlab or Python.

The first time the script is run it will create tab-separated-values (TSV) files for the $X$ and $W$ matrices. On subsequent runs, the script will read the TSV files, execute $k_{max}$ iterations, update the TSV files, and exit.

Thus you can intermittently refine the solution until you run out of patience.

#!/usr/bin/env  julia

#  Sequential Coordinate-wise algorithm for Non-Negative Least-Squares
#  as described on pages 10-11 of
#     http://users.wfu.edu/plemmons/papers/nonneg.pdf
#
#  Convergence is painfully slow, but unlike most other NNLS
#  algorithms the objective function is reduced at each step.
#
#  The algorithm described in the PDF was modified from its
#  original vector form:  |Ax - b|²
#    to the matrix form:  |LXKᵀ - M|²  +  λ|X|²
#
#  and to include the regularization term.

using LinearAlgebra, MAT, DelimitedFiles

function main()
  matfile = "problem.mat"
  Xfile   = "problem.mat.X.tsv"
  Wfile   = "problem.mat.W.tsv"

# read the matrices from the Matlab file
  f = matopen(matfile)
    K = read(f,"K1"); println("K: size = $(size(K)),\t rank = $(rank(K))")
    L = read(f,"K2"); println("L: size = $(size(L)),\t rank = $(rank(L))")
    M = read(f, "M"); println("M: size = $(size(M)),\t rank = $(rank(M))")
  # S = read(f,"S00");println("S: size = $(size(S)),\t rank = $(rank(S))")
  close(f)

  A = L'L
  B = K'K
  C = -L'M*K
  m,n = size(C)
  λ = 1/10     # regularization parameter
  kmax = 100   # maximum iterations


# specify the size of the work arrays
  X = 0*C
  W = 1*C
  H = A[:,1] * B[:,1]'

# resume from latest saved state ... or reset to initial conditions
  try
     X = readdlm(Xfile);  println("X: size = $(size(X)), extrema = $(extrema(X))")
     W = readdlm(Wfile);  println("W: size = $(size(W)), extrema = $(extrema(W))")
     println()
  catch
     @warn "Could not read the saved X,W matrices; re-initializing."
     X = 0*C
     W = 1*C
  end

  fxn = (norm(L*X*K' - M)^2 + λ*norm(X)^2) / 2
  println("at step 0, fxn = $fxn")

  k = 0
  while k < kmax
     for i = 1:m
         for j = 1:n
             mul!(H, A[:,i], B[:,j]')
             H[i,j] += λ
             δ = min( X[i,j], W[i,j]/H[i,j] )
             X[i,j] -= δ
             H .*= δ
             W .-= H
         end
     end
     k += 1
     fx2 = (norm(L*X*K' - M)^2 + λ*norm(X)^2) / 2
     println("after step $k, fxn = $fx2")

     # convergence check
     if fx2 ≈ fxn; break; end
     fxn = fx2
  end

# save the current state for the next run
  writedlm(Xfile, X)
  writedlm(Wfile, W)

# peek at the current solution
  println("\nsummary of current solution")
  println(" vector(X) = $(X[1:4]) ... $(X[end-3:end])")
  println("extrema(X) = $(extrema(X))")
end

# invoke the main function                                           
main()