I am currently working through an old textbook Practical Optimization by Gill, Murray and Wright (c 1982) who make some derivations which seem correct, but I am unable to duplicate. In the equations below $g$ and $p$ are both vectors. $g$ is the gradient from the current iterate of a optimization procedure and $p$ is the search direction we wish to solve for.
The authors inform the reader that the minimum (over $p$) of
$\Large\frac{g'p}{||p||}$ is $p=-g$.
Alternatively, we might form a different norm for $p$ by considering a symmetric positive definite matrix $C$ in which case the minimizer of
$\Large\frac{g'p}{(p'Cp)^{1/2}}$
is
$p=-C^{-1}g$.
I am having trouble deriving these statements.
My question is: Does it take more than a derivative and setting equal to zero to prove this? If so, how do I derive these formulas?
This question is similar to
Gradient descent with constraints
but not identical. This other question deals with solutions that are unit length, not necessarily step directions which are unit length.
Best Answer
This is a simple application of Cauchy-Schwarz.
$(g^Tp)^2 \leq \|g\|^2 \|p\|^2.$
So, $-\|g\| \leq \frac{g^T p}{\|p\|} \leq \|g\| $ and the lower bound is attained when $ p = -g $.
Similarly, $ (g^T p)^2 = \langle C^{-1/2}g , C^{1/2} p\rangle^2 \leq (g^T C^{-1} g) \times (p^T C p) $ and the lower bound is similarly attained.