Lagrange Multipliers and quasi-Newton methods

constraintsconvex optimizationlagrange multipliernewton raphsonoptimization

Consider an optimisation problem of the form
$$
\begin{aligned}
&\min f(x)\\
&\text{s.t. } g(x) = 0
\end{aligned}
$$

with $f,g: \mathbb{R}^n \to \mathbb{R}$ convex and twice continuously differentiable. For small scale problems (i.e. $n$ small), a simple method of solving this is to consider the lagrangian
$$L(x, \lambda) = f(x) + \lambda g(x)$$
and solve $\nabla_{x,\lambda} L(x,\lambda) = 0$ using Newton's method.

For larger scale problems this becomes difficult because in each step of Newton's method we need to solve the system
$$\nabla_{x,\lambda}^2 L(x_k,\lambda_k) (\Delta x,\Delta\lambda) = – \nabla_{x,\lambda} L(x_k,\lambda_k)$$
where the Hessian $\nabla_{x,\lambda}^2 L(x_k,\lambda_k)$ is of shape $(n+1, n+1)$. For an unconstrained problem it is then common to use a quasi-Newton method, such as BFGS, which iteratively builds up an estimation of the inverse hessian, and thus avoids solving the large system.

When I try to use the same approach for a problem with one constraint like above, I run into the problem that most quasi-Newton methods are only capable of finding minima of the objective as their estimates of the Hessian are positive definite. But the approach with the Lagrangian actually requires us to find a saddle point of the Lagrangian. If I'm not mistaken, The Hessian at the stationary point we are looking for has all but one positive eigenvalues and so it is indefinite.

Question

What quasi-Newton methods can find the stationary point of the above Lagrangian, even though the Hessian will not be positive definite? Why does this seem like an unpopular approach? (Judging by the fact that the most popular quasi Newton methods have positive definite Hessian estimates)

I know the Symmetric Rank One method does not guarantee a positive definite Hessian, but this is often seen as a downside of this method. Should this method be able to find the stationary point of the above Lagrangian? There is also Broyden's method but this does not exploit the fact that the Hessian is symmetric.

Best Answer

Thanks to the discussion in the comments and the mentioned sources I figured out the answer. The Hessian of the Lagrangian has the following block matrix structure: $$ \begin{bmatrix} H & J\\ J^T & 0 \end{bmatrix} $$ where $H = \nabla^2_{x}L(x, \lambda)$ and $J = \nabla_x g(x)$. This is indeed not positive definite, but the block $H$ is positive definite because of the convexity of $f$ and $g$. To exploit this, you can use BFGS updates on only this part. Using this approach, the entire Hessian of the Lagrangian is still easily invertible, because if we can compute (the action of) $H^{-1}$, the entire Hessian can be inverted as $$ \begin{bmatrix} H & J\\ J^T & 0 \end{bmatrix}^{-1} = \begin{bmatrix} H^{-1} - sH^{-1} JJ^TH^{-1} & sH^{-1} J\\ sJ^TH^{-1} & -s \end{bmatrix} $$ where the scalar $s$ is given by $s = \frac{1}{J^T H^{-1}J}$.

If you have more than 1 equality constraint then this does become more difficult, as the computation of $s$ now requires the inverse of $J^T H^{-1} J$ as well. If this is to difficult, then you can approximate $J$ using Broyden's method, combined with BFGS for $H$.

Related Question