I'm reading about line search methods in Numerical Optimization by Nocedal and Wright and am wondering how the derivative of the line search function is obtained. In exact line search methods with a real-valued function $f: \mathbb{R}^n \rightarrow \mathbb{R}$, iterate $x \in \mathbb{R}^n$, and search direction $d \in \mathbb{R}^n$ , we want to minimize the function $$\phi(\alpha) = f(x + \alpha d),$$ where $\alpha > 0$ to find the optimal step length. The derivative of $\phi(\alpha)$ is given as $$\phi'(\alpha) = \langle \nabla f(x + \alpha d), d \rangle,$$
but I am unsure with how exactly this derivative is obtained. I understand there is an application of the chain rule but don't see understand how the inner product and gradient come in.
It's quite easy to verify for concrete functions, for example if $f(x) = x_1^2 + x_2^2$, and $\phi(\alpha) = f(x+\alpha d) = (x_1 + \alpha d_1)^2 + (x_2+\alpha d_2)^2$ then
$$
\begin{aligned}
\phi'(\alpha) &= 2(x_1 + \alpha d_1) d_1 + 2(x_2 + \alpha d_2) d_2 \\
&= \left\langle \begin{bmatrix}2(x_1 + \alpha d_1) \\ 2(x_2 + \alpha d_2)\end{bmatrix}, \begin{bmatrix}d_1 \\ d_2 \end{bmatrix} \right\rangle \\
&=\langle \nabla f(x + \alpha d), d\rangle ,
\end{aligned}
$$
as required.
General Case
In the general case I'm stuck with how to take a derivative of a function from a function that has a vector as an input with respect to a scalar.
$$
\begin{aligned}
\phi'(\alpha) &= \frac{\mathrm{d}}{\mathrm{d}\alpha} f(x+\alpha d) \\
&= \ ?.
\end{aligned}
$$
I've looked at a couple of old calculus notes and other sources online but can't find a definitive differentiation rule to take this derivative and somehow get the gradient and inner product in $\langle \nabla f(x + \alpha d), d\rangle$.
Best Answer
Background knowledge: If $F:\mathbb R^n \to \mathbb R^m$, then $F'(x)$ is an $m \times n$ matrix which has the property that $$ F(x + \Delta x) \approx \underbrace{F(x)}_{m \times 1} + \underbrace{F'(x)}_{m \times n} \underbrace{\Delta x}_{n \times 1} $$ for any small vector $\Delta x \in \mathbb R^n$
If $f:\mathbb R^n \to \mathbb R$, then $f'(x)$ is a $1 \times n$ row vector. If we use the convention that the gradient is a column vector, then $$ f'(x) = \nabla f(x)^T. $$
Solution: Note that $\phi(\alpha) = f(g(\alpha))$, where $g(\alpha) = x + \alpha d$. The derivative of $g$ is $$ g'(\alpha) = d. $$ By the chain rule, the derivative of $\phi$ is \begin{align} \phi'(\alpha) &= f'(g(\alpha)) g'(\alpha) \\ &= f'(x + \alpha d) d \\ &= \nabla f(x + \alpha d)^T d \\ &= \langle \nabla f(x + \alpha d), d \rangle. \end{align}