How to Use Karush-Kuhn-Tucker (KKT) Conditions in Optimization

constraintskarush kuhn tuckerlagrange multiplier

I am trying to understand how to use the Karush-Kuhn-Tucker conditions, similar as asked but not answered in this thread.

Assume the target function is given by $f(x)$, where $x$ a vector. Let $g(x) \ge 0 $ be an inequality constraint under which we wish to maximize $f$.

The Lagrange function is given by $L(x,\lambda)=f(x)+\lambda g(x)$. From a textbook I know that I have to maximize this function "with respect to the conditions":

$$ g(x) \ge 0 \\ \lambda \ge 0 \\ \lambda g(x) = 0$$

Now I do not fully understand what this means. Does this mean I have to maximize $L$ as if $g(x)=0$ and then I 'check' whether the three conditions are met?

An example. Let $f(x)= 1 – x_1^2 – x_2^2$ and $g(x)= 1-x_1-x_2$. We find

$$ \nabla_x L = 2
\begin{bmatrix}
-1 & 0 \\
0 & -1 \\
\end{bmatrix}
x + \lambda
\begin{bmatrix}
-1 \\
-1 \\
\end{bmatrix} =0
$$

$$ \nabla_{\lambda} L =
\begin{bmatrix}
-1 \\
-1 \\
\end{bmatrix}
x + 1 = 0
$$

From here we can simplify to find $\lambda=-1$ and

$$x=\begin{bmatrix}
1/2 \\
1/2 \\
\end{bmatrix}
$$

We can verify $g(x) = 0$ as well as $\lambda g(x) = 0$ as required. However, $\lambda<-1$. So it seems the optimizations 'failed', but I am not sure what this means. It seems to suggest we found a minimum? Please help me with how to work with these constraints.

Best Answer

Think about what an inequality constraint means for optimality:

  • either the optimum is away from the boundary of the optimization domain, and so the constraint plays no role;
  • or the optimum is on the constraint boundary. In this case $f$ has a local minimum if the "downhill direction" is directly opposed to the constraint: $$\nabla f = \lambda \nabla g$$ for some positive $\lambda$. (Consider that if $\nabla f$ is merely parallel to $\nabla g$, but has opposite direction, then you can lower $f$ by moving slightly inside the domain -- separating from the constraint boundary. And of course if $\nabla f$ and $\nabla g$ are not parallel, you can decrease $f$ either by sliding along the constraint boundary, or moving inside the domain.)

This either-or condition is usually called complementarity and the difficult part of solving an inequality-constrained minimization problem is figuring out the status of each constraint: classifying it either as inactive (first case above, where $g(x) > 0$ at the optimum) or active (second case above, where $g(x) = 0$.)

The trio of conditions \begin{align*} g(x) &\geq 0\\ \lambda &\geq 0\\ g(x)\lambda &= 0 \end{align*} is just a clever way of encoding that every inequality constraint must be either active or inactive.

As a practical matter, you can solve the optimization problem by trying all possible active sets: i.e., for a single constraint, try solving the problem assuming the constraint is active, and solve it again assuming the constraint is inactive. Check which of the solutions are valid, as you suggest in your post.

Of course for large numbers $n$ of inequality constraints this method doesn't scale well, since there are $2^n$ possible active sets. Sophisticated numerical algorithms exist which try to deal with this.

Keywords for further study: active set methods, interior point methods.

Related Question