Derivative of the Lagrangian dual: Can I treat the optimal parameter $x^*$ in the dual $g(\lambda) = L(x^*, \lambda)$ as a constant w.r.t $\lambda$

convex optimizationlagrange multiplier

Suppose I have a standard constrained optimization problem
$$
\min_x f(x) \quad \text{s.t.} h_i(x) \ge 0,
$$

where $f$ and $h_1,…,h_m$ are convex functions $\mathbb{R}^n \to \mathbb{R}$. The dual is given as
$$
g(\lambda) = \min_xL(x, \lambda) = L(x^*, \lambda)
$$

with $L$ the Lagrangian. To obtain the optimal Lagrangian multipliers, I need to solve
$$
\lambda^* = \text{argmax}_\lambda \;g(\lambda).
$$

I would go on and take the derivative w.r.t $\lambda$. My question is now:

Can I treat the optimal parameter $x^*$ in the dual $g(\lambda) = L(x^*, \lambda)$ as a constant w.r.t $\lambda$?

On my first thought: No, since it depends on $\lambda$. (If it didn't we would already have solved the primal problem). But I have found several examples, where we can take the derivative without the substitution of $x^*$ and treat it as a constant. So I was wondering if that is a general rule. If that is somehow true, this would simplify the derivative a lot.

I'll give a concrete example:
Let's solve
$$
\min x^2 \quad \text{s.t.} -x -2 \le 0
$$

The resulting Lagrangian is $ L(x, \lambda) = x^2 + \lambda(-x -2)$.
Taking the derivative w.r.t $x$, and setting it to 0 yields
$ 2x – \lambda = 0 \implies x^* = \lambda / 2$.

The dual is therefore
$$
g(\lambda) = x^{*2} + \lambda(-x^* -2) = \frac 1 4 \lambda^2 + \lambda(-\frac \lambda 2 -2).
$$

Now, when i take the "correct" derivative by taking the derivative of the right side w.r.t. to $\lambda$, I'll get
$$
g'(\lambda) = – \frac \lambda 2 – 2.
$$

This is the same as if I take $x^{*}$ as a constant on the left side and take the derivative, resulting in $$-x^* – 2 = – \frac \lambda 2 – 2.$$

Why does this work?
Any help is much appreciated!

Best Answer

In general, the Lagrangian is $$L(x,\lambda) = f(x) - \sum_i \lambda_i h_i(x).$$ and $x^*$ satisfies $f'(x^*) - \sum_i \lambda_i h_i'(x^*) = 0$. Assume $x(\lambda)$ is a function that satisfies that equation for all $\lambda$, then $$g(\lambda) = f(x(\lambda)) - \sum_i \lambda_i h_i(x(\lambda)).$$

By application of the chain rule and the product rule: \begin{align} g'(\lambda) &= f'(x(\lambda))x'(\lambda) -\sum_i h_i(x(\lambda))- \sum_i \lambda_i h_i'(x(\lambda))x'(\lambda) \\ &= \left( f'(x(\lambda)) -\sum_i \lambda_i h_i'(x(\lambda)) \right)x'(\lambda) - \sum_i h_i(x(\lambda)) \\ &= - \sum_i h_i(x(\lambda)), \end{align} which is what you get if take the derivative of $g$ without applying the chain rule and plug in $x(\lambda)$.