[Math] Dual ascent and gradient of convex conjugate

convex optimizationconvex-analysisoptimization

I am reading this manuscript and in section 2.1 (Dual Ascent) Boyd describes the following:

Minimize $f(x)$ subject to $Ax=b$

This gives a lagrangian:

$$ L(x,y) = f(x) + \langle y, Ax-b\rangle$$

which leads to the dual function:

$$ g(y) = \inf\limits_{x} L(x,y) = -f^*(-A^* y) – \langle y,b\rangle$$

where $f^*$ is the convex conjugate of $f$.


I understand how the above is derived. What I don't understand is the following claim from Boyd:

Assuming that $g(y)$ is differentiable we can find the gradient $\nabla g(y)$ in the following way, first find $x^+ = \arg\min\limits_{x}L(x,y)$ and then we will have $\nabla g(y) = Ax^+-b$.


When I try to find $\nabla g(y)$ myself I get the following:

$$\nabla g(y) = \nabla \left(-f^*(-A^* y)-\langle y,b\rangle\right),\\
=-\nabla \left(f^*(-A^* y)\right)-\nabla\left(\langle y,b\rangle\right),\\
=-\arg\max\limits_{x}\left(\langle -A^*y, x\rangle – f(x)\right)-b\\
=-\arg\max\limits_{x}\left(-\langle y, Ax\rangle – f(x)\right)-b\\
=-\left(\arg\min\limits_{x}L(x,y)\right)-b\\
=-x^+-b$$

Can someone explain what I am doing wrong?


The solution was to assume differentiability somewhere. E.g. if we assume that $f^*$ is differentiable, then $f$ is strictly convex and the $L(\cdot,y)$ has a unique minimizer. Under this assumption, it holds $$\nabla f^*(u) = \arg\max\limits_{x}(\langle u, x\rangle -f(x))$$. Composing $f^*$ with the linear operator of $-A^*$ and using the chain rule gives
$$\nabla (f^*\circ-A^*)(y) = -A\nabla f^*(-A^*y) = -Ax^+-b.$$

Best Answer

Let's start from the definition of a subgradient: $$v\in\partial g(y) \quad\Longleftrightarrow\quad g(z) \leq g(y) + \langle v, z - y \rangle \quad \forall z$$ Pulling from the definition, we have $$g(z) = \inf_x f(x) + \langle A x - b, z \rangle = \inf_x f(x) + \langle A x - b, y \rangle + \langle A x - b, z - y \rangle$$ Let $x^+\in\mathop{\textrm{argmin}} L(x,y)$. Then $$\begin{aligned} \inf_x f(x) + \langle A x - b, y \rangle + \langle A x - b, z - y \rangle &\leq f(x^+)+\langle Ax^+-b,y\rangle + \langle Ax^+-b, z-y\rangle \\ &= g(y) + \langle Ax^+-b, z-y\rangle \end{aligned}$$ Therefore, $$g(z) \leq g(y) + \langle Ax^+-b, z-y\rangle$$ Therefore, $Ax^+-b\in\partial g(y)$. If $x^+$ is a unique minimizer, or if the value of $Ax-b$ is identical for all minimizers, then $g$ is differentiable at $y$ and its gradient is $\nabla g(y)=Ax^+-b$.

EDIT: actually, the logic here actually isn't complete. What we haven't proven is a reverse implication: that if $v$ is a subgradient, it must be equal to $Ax^+-b$ (or some other minimizer). Without this, we can't say that $Ax^+-b$ is the gradient, because we can't guarantee it is a unique subgradient.

Related Question