Problem with Proof Gradient Steepest Ascent

gradient descentvector analysis

I am going through the properties of the gradient, and in particular I try to proof why the gradient is pointing to the direction of the steepest ascent. Here is what I’ve done so far:

$$
\partial_vf(\mathbf{a}) = \nabla f(\mathbf{a}) \cdot \mathbf{v}
$$

where $\partial_vf(\mathbf{a})$ is the directional deriviative and $\nabla f(\mathbf{a})$ the gradien and $v$ a vector with $\|\mathbf{v}\| = 1$, for all quantities assuming that they exist and are of appropriate dimensions.

The steepest ascent (s.a.) would be the direction where the partial derivative is highest, thus:

\begin{align}
\text{s.a.} &= \underset{v}{\operatorname{argmax}} \partial_vf(\mathbf{a}) \\
&= \underset{v}{\operatorname{argmax}} \nabla f(\mathbf{a}) \cdot \mathbf{v} \\
&= \underset{v}{\operatorname{argmax}} \|\nabla f(\mathbf{a})\|\ \|\mathbf{v}\| \cos(\phi) \\
&= \underset{\phi}{\operatorname{argmax}} \|\nabla f(\mathbf{a})\|\ \cos(\phi)
\end{align}

$$
\underset{\phi}{\operatorname{argmax}}\ \cos(\phi) = 0 \iff \cos(\phi) = 1
$$

Now if I am not mistaken, I need to prove that:

$$
\phi = 0 \iff \mathbf{v} = \frac{\nabla f(\mathbf{a})}{\|\nabla f(\mathbf{a})\|}.
$$

$$
(\Longleftarrow):\
\mathbf{v} = \frac{\nabla f(\mathbf{a})}{\|\nabla f(\mathbf{a})\|} \implies \cos(\phi) = \frac{\mathbf{v} \cdot \nabla f(\mathbf{a})}{\|\mathbf{v}\| \ \|\nabla f(\mathbf{a})\|} = \frac{\|\nabla f(\mathbf{a})\|^2}{\|\nabla f(\mathbf{a})\|^2} = 1 \implies \phi = 0.
$$

With the second direction I am stuck:
$$
(\implies):\
\phi = 0 \implies \cos(\phi) = 1 = \frac{\mathbf{v}\cdot \nabla f(\mathbf{a})}{\|\mathbf{v}\| \ \|\nabla f(\mathbf{a})\|} \\
\implies \|\mathbf{v}\| \ \|\nabla f(\mathbf{a})\| = \mathbf{v}\cdot \nabla f(\mathbf{a})
$$

EDIT:

Since:

\begin{align}
\|\mathbf{v}\| = 1 \implies \|\mathbf{v}\| \ \|\nabla f(\mathbf{a})\| &= \mathbf{v}\cdot \nabla f(\mathbf{a}) \\
\|\nabla f(\mathbf{a})\|&= \mathbf{v}\cdot \nabla f(\mathbf{a})
\end{align}

clearly this is true if:
$$
\mathbf{v} = \frac{\nabla f(\mathbf{a})}{\|\nabla f(\mathbf{a})\|}
$$

then:
$$
\|\nabla f(\mathbf{a})\| = \frac{\nabla f(\mathbf{a})\cdot \nabla f(\mathbf{a})}{\|\nabla f(\mathbf{a})\|} = \frac{\|\nabla f(\mathbf{a})\|^2}{\|\nabla f(\mathbf{a})\|} = \|\nabla f(\mathbf{a})\|
$$

However, this solution is basically obtained by making an ansatz (an educated guess) and I am interested in a more general approach.

Best Answer

Let $g= \nabla f(\mathbf{a})$. We can assume that $g \ne 0.$ For $v$ with $||v|| = 1 $ we get (with Cauchy-Schwarz):

$$|\partial_vf(\mathbf{a}) \cdot v | \le ||g|| \cdot ||v|| = ||g||.$$

Hence

$$-||g|| \le \partial_vf(\mathbf{a}) \cdot v \le ||g||.$$

Now put $v_1= \frac{g}{||g||}$ and $v_2=-v_1.$ Then

$$\partial_{v_1}f(\mathbf{a})=||g||$$

and

$$\partial_{v_2}f(\mathbf{a})=-||g||.$$

Related Question