How do you take the derivative for a line search with respect to the step length

multivariable-calculusnonlinear optimization

I'm reading about line search methods in Numerical Optimization by Nocedal and Wright and am wondering how the derivative of the line search function is obtained. In exact line search methods with a real-valued function $f: \mathbb{R}^n \rightarrow \mathbb{R}$, iterate $x \in \mathbb{R}^n$, and search direction $d \in \mathbb{R}^n$ , we want to minimize the function $$\phi(\alpha) = f(x + \alpha d),$$ where $\alpha > 0$ to find the optimal step length. The derivative of $\phi(\alpha)$ is given as $$\phi'(\alpha) = \langle \nabla f(x + \alpha d), d \rangle,$$
but I am unsure with how exactly this derivative is obtained. I understand there is an application of the chain rule but don't see understand how the inner product and gradient come in.

It's quite easy to verify for concrete functions, for example if $f(x) = x_1^2 + x_2^2$, and $\phi(\alpha) = f(x+\alpha d) = (x_1 + \alpha d_1)^2 + (x_2+\alpha d_2)^2$ then
$$
\begin{aligned}
\phi'(\alpha) &= 2(x_1 + \alpha d_1) d_1 + 2(x_2 + \alpha d_2) d_2 \\
&= \left\langle \begin{bmatrix}2(x_1 + \alpha d_1) \\ 2(x_2 + \alpha d_2)\end{bmatrix}, \begin{bmatrix}d_1 \\ d_2 \end{bmatrix} \right\rangle \\
&=\langle \nabla f(x + \alpha d), d\rangle ,
\end{aligned}
$$

as required.

General Case

In the general case I'm stuck with how to take a derivative of a function from a function that has a vector as an input with respect to a scalar.

$$
\begin{aligned}
\phi'(\alpha) &= \frac{\mathrm{d}}{\mathrm{d}\alpha} f(x+\alpha d) \\
&= \ ?.
\end{aligned}
$$

I've looked at a couple of old calculus notes and other sources online but can't find a definitive differentiation rule to take this derivative and somehow get the gradient and inner product in $\langle \nabla f(x + \alpha d), d\rangle$.

Best Answer

Background knowledge: If $F:\mathbb R^n \to \mathbb R^m$, then $F'(x)$ is an $m \times n$ matrix which has the property that $$ F(x + \Delta x) \approx \underbrace{F(x)}_{m \times 1} + \underbrace{F'(x)}_{m \times n} \underbrace{\Delta x}_{n \times 1} $$ for any small vector $\Delta x \in \mathbb R^n$

If $f:\mathbb R^n \to \mathbb R$, then $f'(x)$ is a $1 \times n$ row vector. If we use the convention that the gradient is a column vector, then $$ f'(x) = \nabla f(x)^T. $$


Solution: Note that $\phi(\alpha) = f(g(\alpha))$, where $g(\alpha) = x + \alpha d$. The derivative of $g$ is $$ g'(\alpha) = d. $$ By the chain rule, the derivative of $\phi$ is \begin{align} \phi'(\alpha) &= f'(g(\alpha)) g'(\alpha) \\ &= f'(x + \alpha d) d \\ &= \nabla f(x + \alpha d)^T d \\ &= \langle \nabla f(x + \alpha d), d \rangle. \end{align}

Related Question