[Math] Algorithm for solving sparse equality-constrained least squares

numerical linear algebranumerical optimizationoptimization

I have a diagonal, positive-definite inner product matrix $M$ and want to find a minimizer of

$$\min_q \frac{1}{2} \|q-q_0\|_M^2\qquad \text{s.t.}\qquad C^Tq+c_0 = 0,$$

where $q_0, c_0$, and $C$ are given and $C$ is very large and sparse.

What is the best algorithm for efficiently solving this problem? The straightforward approach is to solve the normal equations
$$\left[\begin{array}{cc}M & C\\C^T & 0\end{array}\right]\left[\begin{array}{c}q\\ \lambda\end{array}\right] = \left[\begin{array}{c} Mq_0 \\-c_0\end{array}\right]$$
for $q$; the matrix here is sparse and symmetric, but indefinite and possibly singular, so sparse iterative solvers like good old Conjugate Gradient cannot be used.

A little manipulation of the above gives an alternative two-step approach where you solve first for the Lagrange multipliers:
$$C^T M^{-1} C\lambda = C^Tq_0 + c_0$$
and then substitute $\lambda$ into
$$q = q_0 – M^{-1} C\lambda.$$
Since $M$ is diagonal it is trivial to invert, and $C^T M^{-1} C$ is symmetric positive-(semi)definite, but is dense and so also is inefficient to solve when the number of constraints is very large.

What is the standard approach to solving this problem?

Best Answer

J.M. is correct that the difficulty is that C does not necessarily have full column rank, causing your augmented system to be singular. The standard approach in this case would be to regularize your problem, i.e., replace the constraint $C^T q + c_0 = 0$ with $C^T q + \delta r + c_0 = 0$ for some small value of $\delta > 0$ (e.g., between $10^{-6}$ and $10^{-8}$) and a new variable $r$ representing the residual of your constraints. Your constraint matrix is now $$ \begin{bmatrix} C^T & \delta I \end{bmatrix} $$ which always has full row rank. At the same time, you'll want to add a term $\delta^2 \|r\|^2$ to your objective so as to minimize this residual.

If everything is nonsingular, you have several options to solve the problem. One is a symmetric indefinite factorization of the augmented system (the one you call the "normal equations"). This can be done very efficiently and in parallel, if the system is so large. In Matlab, look into the lbl function. In Fortran, look into the HSL library: http://www.hsl.rl.ac.uk/academic.html (the relevant packages are MA27, MA47, MA57 and MA86).

Regarding iterative methods, aside from MINRES, you can also try SYMMLQ if the system is nonsingular. See Mike Saunders' website for links to those codes: http://www.stanford.edu/group/SOL/software.html

There is an alternative if you have a means to identify efficiently a particular solution $\bar{q}_0$ of $C^T q + c_0 = 0$ (by factorization or using some special structure of the problem). You can "shift" your linear system by defining $\bar{q} := q - \bar{q}_0$: $$ \begin{bmatrix} M & C \\ C^T & 0 \end{bmatrix} \begin{bmatrix} \bar{q} \\ \lambda \end{bmatrix} = \begin{bmatrix} M(q_0 - \bar{q}_0) \\ 0 \end{bmatrix}. $$ These are the optimality conditions of the (unconstrained) linear least-squares problem $$ \min_{\lambda} \ \tfrac{1}{2} \| C \lambda - M(q_0 - \bar{q}_0) \|_{M^{-1}}^2. $$ NOW you can use the conjugate gradients, or better yet, LSQR (still from Michael Saunders' page). LSQR is a variant of CG which will be more accurate if $C$ is ill conditioned. Once you have found $\lambda$, you can recover $\bar{q} = q_0 - \bar{q}_0 - M^{-1} C \lambda$ and your solution is $q = \bar{q} + \bar{q}_0$.

For more general $M$ (non diagonal), there is a nice hybrid alternative. It consists in interpreting the shifted linear systems as the optimality conditions of the quadratic program $$ \min_{\bar{q}} \ -(q_0 - \bar{q}_0)^T M \bar{q} + \tfrac{1}{2} \bar{q}^T M \bar{q} \quad \text{s.t.} \ C^T \bar{q} = 0. $$ The method is a hybrid between a factorization and CG. It needs to factorize the matrix $$ \begin{bmatrix} I & C \\ C^T & 0 \end{bmatrix} $$ and applies CG to $M$. At each iteration, the search direction is projected into the nullspace of $C^T$, ensuring that all iterates satisfy $C^T \bar{q} = 0$. The main reference is http://epubs.siam.org/sisc/resource/1/sjoce3/v23/i4/p1376_s1

Clearly, for diagonal $M$, there's no advantage versus a direct factorization of your original system. But in many applications, replacing $M$ by $I$ leads to a very sparse matrix. Also $M$ need not be positive definite any more but only positive definite on the nullspace of $C^T$.

Related Question