Does this proof that gradient is the directional derivative of norm 1 that is of fastest increase show it is unique

multivariable-calculusreal-analysis

This is from Fitzpatrick Advanced Calculus:

Corollary 13.18 Let $\mathcal{O}$ be an open subset of $\mathbb{R}^{n}$ that contains the point $\mathbf{x}$ and suppose that the function $f: \mathcal{O} \rightarrow \mathbb{R}$ is continuously differentiable. If $\nabla f(\mathbf{x}) \neq \mathbf{0}$, then the direction of norm 1 at the point $\mathbf{x}$ in which the function $f: \mathcal{O} \rightarrow \mathbb{R}$ is increasing the fastest is the direction $\mathbf{p}_{0}$ defined by
$$
\mathbf{p}_{0}=\frac{\nabla f(\mathbf{x})}{\|\nabla f(\mathbf{x})\|} .\tag{13.28}
$$

Proof
Using formula (13.24) and the Cauchy-Schwarz Inequality, it follows that if $\mathbf{p}$ is any point in $\mathbb{R}^{n}$ of norm 1 , then
$$
\left|\frac{\partial f}{\partial \mathbf{p}}(\mathbf{x})\right|=|\langle\nabla f(\mathbf{x}), \mathbf{p}\rangle| \leq\|\nabla f(\mathbf{x})\| \cdot\|\mathbf{p}\|=\|\nabla f(\mathbf{x})\| \tag{13.29}
$$

On the other hand, if $\mathbf{p}_{0}$ is defined by (13.28), then $\mathbf{p}_{0}$ has norm 1 , and using (13.24), it follows that
$$
\frac{\partial f}{\partial \mathbf{p}_{0}}(\mathbf{x})=\left\langle\nabla f(\mathbf{x}), \mathbf{p}_{0}\right\rangle=\left\langle\nabla f(\mathbf{x}), \frac{\nabla f(\mathbf{x})}{\|\nabla f(\mathbf{x})\|}\right\rangle=\|\nabla f(\mathbf{x})\| .
$$

This calculation, together with inequality (13.29), implies that if $\mathbf{p}$ has norm 1, then
$$
\frac{\partial f}{\partial \mathbf{p}}(\mathbf{x}) \leq \frac{\partial f}{\partial \mathbf{p}_{0}}(\mathbf{x}) . \tag*{$\blacksquare$}
$$

(Transcribed from Screenshots)

This proof shows that the normalized gradient achieves the maximum possible norm, but does it show that it is unique? Of course, the negative of the gradient also has the same norm, but the normalized gradient should be the unique direction that gives us the maximum in the positive sense.

Best Answer

The use of the Cauchy-Schwarz inequality is hiding some things here. Using $D_pf(x)$ to denote the directional derivative with unit direction $p\in\mathbb{R}^n$, we have

$$D_pf(x) = \langle \nabla f(x), p\rangle = \|\nabla f(x)\|\|p\|\cos\theta = \|\nabla f(x)\|\cos\theta,$$ where $\theta$ is the (acute) angle between $p$ and $\nabla f(x)$. This is maximized when $\cos\theta=1$, yet with the acute restriction on $\theta$ there is a unique angle which works (namely $\theta=0$). Thus the unique direction that maximizes $D_pf(x)$ is when $p$ is parallel to $\nabla f(x)$.