Solve Large Scale Matrix Least Squares with Frobenius Regularization Problem efficiently

convex optimizationleast squaresoptimization

How to solve the following minimization problem: $$\min_{S>0}{F(\mathbf{S}) }= \frac{1}{2}\Vert \mathbf{M} – \mathbf{K_2SK_1^T}\Vert _F^2+\frac{1}{20}\Vert\mathbf{S}\Vert_F^2$$
where $\mathbf{S}\in R^{256 \times 256}$ with nonegative elements, $\mathbf{M}\in R^{n \times m}$, $\mathbf{K_2} \in R^{n \times 256}$, $\mathbf{K_1} \in R^{m \times 256}$. In most cases $3500\lt m \lt 18000$, $8 \lt n \lt 128$.

The data of a minimal case can be downloaded here. In this case $m=3788$, $n=16$. The following code help to load the data into workspace:

MATLAB

load('problem.mat')

Python

import scipy.io
data = scipy.io.loadmat('/home/ubuntu/MATLAB/problem.mat')
K1 = data['K1']
K2 = data['K2']
M = data['M']
S_inital_guess = data['S00']

What I've tried

Vectorize the problem using $\mathbf{K}=kron(\mathbf{K_2},\mathbf{K_1})$.
But $\mathbf{K}$ is too large for ordinary PC. And any optimization strategy using hessian matrix would produce more larger matrice.
Solving the matrix-form problem directly which produce a 4-order Hessian tesnsor.
Without hession, the algorithm(steepest descent with exact/inexact line search) converges too slowly.
CVXPY – out of memory

n = 256

X = cp.Variable((n,n))

constraints = [X>=0]

gamma = cp.Parameter(nonneg=True, value=1)

obj = cp.Minimize(cp.norm(K2 @ X @ K1.transpose() – M,'fro') + gamma*cp.norm(X,'fro')**2)

prob = cp.Problem(obj,constraints)

prob.solve(verbose=True)

How to solve it?

How to solve this large scale minimization problem efficiently? Could you please give me some code (python or matlab) snippet to solve the attach problem? Are there any out-of-box toolboxes I could use?

For Algorithm Testing

I've added a new mat file containing $K_1$,$K_2$,$M$ and a right answer $Xtrue$ for testing. All matrix are much smaller than the original problem in this file.

Best Answer

Here is a simple Julia script. If you translate it to another language beware of the nested loops. Julia handles these efficiently but they should be vectorized for Matlab or Python.

The first time the script is run it will create tab-separated-values (TSV) files for the $X$ and $W$ matrices. On subsequent runs, the script will read the TSV files, execute $k_{max}$ iterations, update the TSV files, and exit.

Thus you can intermittently refine the solution until you run out of patience.

#!/usr/bin/env  julia

#  Sequential Coordinate-wise algorithm for Non-Negative Least-Squares
#  as described on pages 10-11 of
#     http://users.wfu.edu/plemmons/papers/nonneg.pdf
#
#  Convergence is painfully slow, but unlike most other NNLS
#  algorithms the objective function is reduced at each step.
#
#  The algorithm described in the PDF was modified from its
#  original vector form:  |Ax - b|²
#    to the matrix form:  |LXKᵀ - M|²  +  λ|X|²
#
#  and to include the regularization term.

using LinearAlgebra, MAT, DelimitedFiles

function main()
  matfile = "problem.mat"
  Xfile   = "problem.mat.X.tsv"
  Wfile   = "problem.mat.W.tsv"

# read the matrices from the Matlab file
  f = matopen(matfile)
    K = read(f,"K1"); println("K: size = $(size(K)),\t rank = $(rank(K))")
    L = read(f,"K2"); println("L: size = $(size(L)),\t rank = $(rank(L))")
    M = read(f, "M"); println("M: size = $(size(M)),\t rank = $(rank(M))")
  # S = read(f,"S00");println("S: size = $(size(S)),\t rank = $(rank(S))")
  close(f)

  A = L'L
  B = K'K
  C = -L'M*K
  m,n = size(C)
  λ = 1/10     # regularization parameter
  kmax = 100   # maximum iterations


# specify the size of the work arrays
  X = 0*C
  W = 1*C
  H = A[:,1] * B[:,1]'

# resume from latest saved state ... or reset to initial conditions
  try
     X = readdlm(Xfile);  println("X: size = $(size(X)), extrema = $(extrema(X))")
     W = readdlm(Wfile);  println("W: size = $(size(W)), extrema = $(extrema(W))")
     println()
  catch
     @warn "Could not read the saved X,W matrices; re-initializing."
     X = 0*C
     W = 1*C
  end

  fxn = (norm(L*X*K' - M)^2 + λ*norm(X)^2) / 2
  println("at step 0, fxn = $fxn")

  k = 0
  while k < kmax
     for i = 1:m
         for j = 1:n
             mul!(H, A[:,i], B[:,j]')
             H[i,j] += λ
             δ = min( X[i,j], W[i,j]/H[i,j] )
             X[i,j] -= δ
             H .*= δ
             W .-= H
         end
     end
     k += 1
     fx2 = (norm(L*X*K' - M)^2 + λ*norm(X)^2) / 2
     println("after step $k, fxn = $fx2")

     # convergence check
     if fx2 ≈ fxn; break; end
     fxn = fx2
  end

# save the current state for the next run
  writedlm(Xfile, X)
  writedlm(Wfile, W)

# peek at the current solution
  println("\nsummary of current solution")
  println(" vector(X) = $(X[1:4]) ... $(X[end-3:end])")
  println("extrema(X) = $(extrema(X))")
end

# invoke the main function                                           
main()

Related Solutions

Solve Linear Least Squares with Matrix Inequality Constraint

Use a linearly constrained linear least squares solver.

For example: lsqlin in MATLAB https://www.mathworks.com/help/optim/ug/lsqlin.html

lsei in R https://rdrr.io/rforge/limSolve/man/lsei.html

The easiest way might be using CVX https://cvxr.com/cvx under MATLAB. CVXR https://cvxr.rbind.io/ under R, or CVXPY https://www.cvxpy.org/ under Python.

Here is the code for CVX:

cvx_begin
variable x(n)
minimize(norm(A*x-b))
A*x >= 0
cvx_end

which will transform the problem into a Second Order Cone Problem, send it to a solver, and transform the solver results back to the original problem as entered. You can include the factor of 1/2 (harmless) and square the norm, which doesn't affect the solution but needlessly makes the problem solution less numerically robust.

Edit: Extra details as requested in chat:

CVX calls a numerical optimization solver to solve the optimization problem. The solver enforces the specified constraints (within solver tolerance).

As mentioned above, CVX actually transforms this into an SOCP (Second Order Cone Problem) by converting the problem into epigraph formulation. It does this by introducing a new variable, t, and in effect moving the original objective to the constraints. Thus produce the problem.

minimize(t)
subject to
  norm((A*x-b) <= t
  A*x >= 0

There might also be a slight rearrangement of the constraint A*x >= 0. CVX calls a Second Order Cone solver optimization solver such as SeDuMi, SDPT3, Gurobi, or Mosek to solve this problem. It then transforms the results back to the original problem formulation as entered by the user.

Solve Matrix Linear Least Squares with Frobenius Norm Regularization and Linear Equality Constraints

The idea is to bring the problem into the form:

$$\begin{aligned} \arg \min_{ \boldsymbol{s} } \quad & \frac{1}{2} {\left\| K \boldsymbol{s} - \boldsymbol{m} \right\|}_{2}^{2} + \frac{\lambda}{2} {\left\| \boldsymbol{s} \right\|}_{2}^{2} \\ \text{subject to} \quad & A \boldsymbol{s} = \boldsymbol{u} \\ \quad & B \boldsymbol{s} = \boldsymbol{v} \end{aligned}$$

Using the Kronecker Product we can see that:

$ K = {K}_{1} \otimes {K}_{2} $.
$ \boldsymbol{s} = \operatorname{vec} \left( S \right) $ where $ \operatorname{vec} \left( \cdot \right) $ is the vectorization operator.
$ \boldsymbol{m} = \operatorname{vec} \left( M \right) $.

The matrices $ A $ and $ B $ are just Selectors of the corresponding elements in $ \boldsymbol{s} $.

Remark
Pay attention that if $ A $ and $ B $ represent a matrix which selects each element of $ \boldsymbol{s} $ exactly once then $ \sum_{i} {u}_{i} = \sum_{i} {v}_{i} $ must hold as it represent the sum of $ \boldsymbol{s} $. Namely $ \boldsymbol{1}^{T} A \boldsymbol{s} = \boldsymbol{1}^{T} B \boldsymbol{s} = \sum_{i} {s}_{i} $. This is the case for your constraints. So it must be like that in order to have a feasible solution.

Now the above is a basic Convex problem which can be solved by Projected Gradient Descent where we project onto the intersection of the 2 equality constraints.

You could even do something simpler by concatenate the matrices and vectors:

$$ C \boldsymbol{s} = \begin{bmatrix} A \\ B \end{bmatrix} \boldsymbol{s} = \boldsymbol{w} = \begin{bmatrix} \boldsymbol{u} \\ \boldsymbol{v} \end{bmatrix} $$

Then it is very similar to Linear Least Squares with Equality Constraint.

An interesting resource with that regard is Robert M. Freund - Projection Methods for Linear Equality Constrained Problems.