KKT Conditions – Understanding Karush-Kuhn-Tucker Conditions

convex optimizationkarush kuhn tuckernonlinear optimizationoptimization

Suppose we may want to use the K–T conditions to find the optimal solution to:
\begin{array}{cc}
\max & (\text { or } \min ) z=f\left(x_{1}, x_{2}, \ldots, x_{n}\right) \\
\text { s.t. } & g_{1}\left(x_{1}, x_{2}, \ldots, x_{n}\right) \leq b_{1} \\
& g_{2}\left(x_{1}, x_{2}, \ldots, x_{n}\right) \leq b_{2} \\
& \vdots \\
&g_{m}\left(x_{1}, x_{2}, \ldots, x_{n}\right) \leq b_{m} \\
& -x_{1} \leq 0 \\
& -x_{2} \leq 0 \\
& \vdots \\
& -x_{n} \leq 0
\end{array}

then the theorem state the KT condition as:
image

Which I really don't understand and eventually failed to applied as my book didn't illustrate any example with details. For sake of clarity, let's pick one minimization problem,

\begin{array}{ll}
\text { Minimize } & Z=2 x_{1}+3 x_{2}-x_{1}^{2}-2 x_{2}^{2} \\
\text { subject to } & x_{1}+3 x_{2} \leq 6 \\
& 5 x_{1}+2 x_{2} \leq 10 \\
& x_{1} \geq 0, i=1,2 .
\end{array}

After google searching and watching YouTube video, I find they solve it by,

$$L(x_1,x_2,x_3,\lambda_1,\lambda_2)=2 x_{1}+3 x_{2}-x_{1}^{2}-2 x_{2}^{2}+\lambda_1(x_{1}+3 x_{2}-6)+\lambda_2(5 x_{1}+2 x_{2}-10)$$

$(1)$ Assuming both $g_1$ and $g_2$ active:

Solving $\frac{\partial L}{\partial x_1}=0,\frac{\partial L}{\partial x_2}=0,\frac{\partial L}{\partial \lambda_1}=0,\frac{\partial L}{\partial \lambda_2}=0 \implies \left(\frac{18}{13},\frac{20}{13},\frac{185}{169},-\frac{11}{169}\right)$ Which can't accept.

$(2)$ Assuming $g_1$ active:

Solving $\frac{\partial L}{\partial x_1}=0,\frac{\partial L}{\partial x_2}=0,\frac{\partial L}{\partial \lambda_1}=0 \implies \left(\frac{3}{2},\frac{3}{2},1,0\right)$, hence $z=\frac34$

$(3)$ Assuming $g_2$ active:

Solving $\frac{\partial L}{\partial x_1}=0,\frac{\partial L}{\partial x_2}=0,\frac{\partial L}{\partial \lambda_2}=0 \implies \left(\frac{89}{54},\frac{95}{108},0,\frac{7}{27}\right)$, hence $z=\frac{371}{216}$

$(4)$ Assume none of constraint is active:

Solving $\frac{\partial L}{\partial x_1}=0,\frac{\partial L}{\partial x_2}=0 \implies \left(1,\frac{3}{4}\right)$ hence $z=\frac{17}{8}$

Hence, $z_{\min}=\frac{3}{4}$


So, I was confusing how the second method satisfy the theorem $10$? Like did they do the same thing? Checking the inequality is binding or not? And could I write the second method on exam as a solution for KKT condition?

Best Answer

First try to understand the following simplified version of KKT:

Suppose you are trying to solve the problem

$$\text{min } f(x)$$

$$\text{s.t. } g_j(x) \leq b_j$$

Where (if i don't recall bad, $f$ and the $g_i$ are convex). Then KKT conditions tell you that if $x^*$ is a solution then you can find $\lambda_j$ such that:

  1. $\lambda_j \geq 0$ for all $j$
  2. $\lambda_j [g_j(x^*)-b_j]=0$ for all $j$
  3. $\frac{\partial f}{\partial x_i}(x^*)+ \sum_j \lambda_j \frac{\partial g_j}{\partial x_i}(x^*)=0$ for all $i$ or equivalently $\nabla f + \sum_j \lambda_j \nabla g_j=0$

Note that the sets $g_j(x)=b_j$ form the boundary of your domain. Look at condition 2. It basically says: "either $x^*$ is in the part of the boundary given by $g_j(x^*)=b_j$ or $\lambda_j=0$. When $g_j(x^*)=b_j$ it is said that $g_j$ is active. So in this setting, the general strategy is to go through each constraint and consider wether it is active or not.

As you can see, condition 2. is independent of $f$, it is just about the location of $x^*$ in the domain. It is about "how many constraints does $x^*$ hit sharply"? Note that for example if no $g_j$ is active, then $\lambda_j=0$ for all $j$ and you are looking for solutions to $\frac{\partial f}{\partial x_i}(x^*)=0$ ie extreme points of $f$.

Now suppose there is only one constraint, call it $g$, and suppose it is active. The you have something like this situation:

(Sorry for handwritten picture but I am so bad at drawing with the computer)

So if $x^*$ is a minimum of $f$ then it means that when you take points $x'$ inside the domain $A$, $f(x')$ is greater than $f(x^*)$. Now, since $g$ is convex, the vector $-\nabla g$ does always point inwards (or paralell), so you would like $f$ to increase in the direction of $\nabla g$. This applies to all points of the border of the boundary $g(x)=0$ so one would think that a minimum must be one where $f$ increases "the most" in the direction of $\nabla g$. Since the direction in which $f$ increases is given by its gradient, you want $\nabla f(x^*)$ and $-\nabla g (x^*)$ to be proportional one to another by a ppositive constant. This exaclty means that there exists some $\lambda \geq 0$ such that $\nabla f = - \lambda \nabla g$. These are precisely conditions 3. and 1., and this reasoning generalizes to more constraints.

(Note that here I am explaining things vaguely. the fact that this works uses strongly that $f$ and $g$ are convex and is precisely is the statement of the Theorem of KKT, whcih is not trivial)

Now, how does this translate to your example? In your example you have two type of restrictions: given by some unknown funtions $g_j$ and other ones given by $-x_i \leq 0$. Let's apply the conditions 1.-3. to the problem $$\text{min } f(x)$$

$$\text{s.t. } g_j(x) \leq b_j \quad\text{and}\quad -x_j\leq 0 $$

Let's call $h_i$ the function sending $x$ to $-x_i$. We should have multipliers $\lambda_j$ (corresponding to the $g_j$) and $\mu_i$ (corresponding to the constraints $h_i \leq 0$) such that

  • (1)$\lambda_j, \mu_i \geq 0$ for all $i,j$.
  • (2a) $\lambda_j (g_j(x^*)-b_j)=0$ for all $j$.
  • (2b) $\mu_i(h_i(x^*)-0)=0$ or equivalently $\mu_i x_i=0$ for all $i$.
  • (3) $\nabla f(x^*) + \sum_j \lambda_j \nabla g(x^*) + \sum_i \mu_i \nabla h_i(x^*)=0$.

But $\nabla h_i = (0,\ldots , -1, \ldots 0)$ so the conditions in (3) actually read as

$$\frac{\partial f}{\partial x_i}(x^*)+ \sum_j \lambda_j \frac{\partial g_j}{\partial x_i}(x^*) - \mu_i=0$$

Therefore, regarding your text, (39) and (40) is just (1), (36) is (3), (37) is (2a) and (38) is (2b) after sustitution of $\mu_i = \frac{\partial f}{\partial x_i}(x^*)+ \sum_j \lambda_j \frac{\partial g_j}{\partial x_i}(x^*)$ into the equation $x_i \mu_i=0$.

The 'method for solving these problems' given in the Youtube video is just setting the conditions to be active or not, which is the same as doing: if $\lambda_j (g_j(x^*)-b_j)=0$ then either $\lambda_j=0$ of $g_i(x^*)=b_j$, except that you don't do the ones for the constraints $\mu_ix_i=0$ because they are easy to handle without making new cases, since you would have to condider $2^4=16$ possibilities...