Alternative form of Gradient Descent

convex optimizationgradient descentoptimization

We define the standard gradient descent as follows:

$$x^{(k+1)} = \arg\min_{x\in\mathbb{R}^n} f(x^{(k)}) + \langle \nabla f(x^{(k)}), x – x^{(k)}\rangle + \frac{1}{2\alpha_k} ||x – x^{(k)}||_2^2$$

Here, $x^{(k)}$ is the result of the $k$-th iteration of the gradient descent. Solving the above equation gives us $x^{(k+1)} = x^{(k)} – \alpha_k \nabla f(x^{(k)})$, which is our usual gradient descent.

How do we prove that the gradient descent problem is equivalent to the following version:

$$x^{(k+1)} = \arg\min_{x\in\mathbb{R}^n}\langle \nabla f(x^{(k)}), x – x^{(k)}\rangle $$ subject to $$||x – x^{(k)}||_2 \leq \alpha_k||\nabla f(x^{(k)})||_2$$

Best Answer

Just find the solution of the latter problem.

Denote $x_k = x^{(k)}$ and $g_k = \nabla f(x^{(k)})$. We need to solve the following convex problem: $$\min\ \langle g_k, x\rangle\quad \mbox{s.t. } \|x-x_k\|^2 \le \alpha_k^2 \|g_k\|^2.$$

If $\|g_k\| = 0$ then obviously $x=x_k$. Suppose now that $\|g_k\| > 0$. Slater's conditions hold because when $x=x_k$ we have $\|x-x_k\|^2 < \alpha_k^2 \|g_k\|^2$. Hence, the above problem is equivalent to its KKT system: \begin{align} g_k + 2\lambda(x-x_k) &= 0,\\ \lambda(\|x-x_k\|^2 - \alpha_k^2 \|g_k\|^2) &= 0,\\ \|x-x_k\|^2 &\le \alpha_k^2 \|g_k\|^2,\\ \lambda &\ge 0. \end{align} The first and the last equations yield $\lambda > 0$. The first equation now becomes $x = x_k - \frac{1}{2\lambda}g_k$. Plugging this into the second equation yields $\alpha_k = \frac{1}{2\lambda}$. We conclude that the optimal solution is $x = x_k - \alpha_k g_k$, QED.

Best Answer

Related Solutions

Getting to the gradient descent algorithm

2-Norm of the Gradient Mapping in Projected Gradient Descent

Related Question