Optimization – Generalizing Lagrange Multipliers to Use the Subdifferential

multivariable-calculusoptimization

Background:
This is a followup to this question:
Lagrange multipliers with non-smooth constraints

Lagrange multipliers can be used for constrained optimization problems of the form

$\min_{\vec x} f(\vec x)$
s.t. $g(\vec x) = 0$.

Briefly, the method works by constructing the Lagrangian, $L(\vec x, \lambda) = f(\vec x) + \lambda g(\vec x)$, then finding points where $\forall i, \frac{\partial L}{\partial x_i} L = 0$ and $\frac{\partial L}{\partial \lambda} L = 0$.

As was kindly pointed out in this answer, the method fails when $g$ is non-differentiable (but continuous), because the partial derivatives may not exist at points of optimality. For example, in the problem, minimize $x_1$ subject to $g(x_1,x_2) = x_1 – |x_2| = 0$. The minimum is at $(0,0)$, where $\frac{\partial g}{\partial x_2}$ does not exist.

Question:
It seems that there should be a natural generalization of the method that uses subgradients and the subdifferential. Does the following work? Is there a reference that describes this in more detail?

Proposal: construct the Lagrangian as usual, but instead of seeking a point where all partial derivatives are 0, seek a point where 0 is in each partial subdifferential. So in the example above, the subdifferential with respect to $x_2$ when $x_2=0$ is the interval $[-\lambda, \lambda]$. Thus, if we were given the solution $x_1=0,x_2=0,\lambda=-1$, we could verify it is a critical point by noting that $\frac{\partial L}{\partial \lambda} = x_1 – |x_2| = 0$, $\frac{\partial L}{\partial x_1} = 1 + \lambda = 0$, and 0 is in the subdifferential of $L$ w.r.t. $x_2$.

Is this argument correct? My intuitive justification is that for any value $f'(x)$ in some variable's subdifferential at $x$, we should be able to construct a smooth function that has $f'(x)$ as its partial derivative at $x$, then solve the smoothed problem with a standard application of Lagrange multipliers.

(Aside: my goal is actually not to find a method to optimize the function. I have a method for optimizing such functions, and I'm trying to develop some theoretical understanding of the solutions that it produces. In particular, I'm trying to understand if proving that a solution satisfies the condition described in the Proposal section is meaningful.)

Best Answer

I think the difficulty with the subdifferential approach you describe lies in finding critical points. For instance, with your example, solving the equations you get by setting partials to $0$ yields only $x_1 = | x_2 |$ and $\lambda = -1$. This set of equations is too undetermined to yield a solution. As you point out, if you were given a solution $x_1 = x_2 = 0$, $\lambda = -1$, you could verify that it is a critical point with the subdifferential approach, but that's quite different from deriving $x_1 = x_2 = 0$, $\lambda = -1$ as a critical point.

Remember, though, that the definition of a critical point of a function includes points at which the derivative fails to exist. So when you're constructing your set of critical points using Lagrange multipliers, look not only for points at which the partials are zero but also for points at which a partial does not exist. For instance, it's clear in your example that the only point at which $\frac{\partial L}{\partial x_2}$ fails to exist is $x_2 = 0$. (You also get $\lambda = 0$ from $\frac{\partial L}{\partial x_2}= 0$, but this is inconsistent with the $\lambda = -1$ from $\frac{\partial L}{\partial x_1}= 0$.) Including that with your equations from $\frac{\partial L}{\partial x_1} = 0$ and $\frac{\partial L}{\partial \lambda} = 0$ quickly gets you the solution $x_1 = x_2 = 0$, $\lambda = -1$.

In fact, finding points where a partial does not exist is generally not any harder (and is sometimes easier) than finding points where a partial is zero. There aren't many ways for a point in the domain of a continuous function to have a partial that fails to exist. (Besides the absolute value situation, you could have division by zero with the partial from an expression like $x^p$, with $0 < p < 1$, in the function. Perhaps there are some others I'm forgetting - maybe others reading this can supply them. There are also some pathological cases, like the continuous but nowhere differentiable Weierstrass function, but those don't generally show up in practice.)

And, of course, if you have more than just a few variables, you don't want to use Lagrange multipliers anyway: Solving a large number of nonlinear equations is just too difficult. In that case, you will probably want an iterative nonlinear optimization method.