[Math] Show that the exact step length in the line search

linear algebraoptimization

Given a positive definite matrix Q, consider the steepest descent method with exact line search for the quadratic function:

$$f(x) = \frac{1}{2}x^TQx + q^tx-\beta $$

with

$$x^{k+1} = x^k +\alpha_ks^k, s^k= – \nabla f(x^k), a_k=argmin_{\alpha > 0}f(x^k+\alpha s^k)$$

Show that
$$\alpha_k = \frac {-\nabla f(x^k)^Ts^k}{(s^k)^TQs^k}$$

The only idea I have for solving this is to plug $x^k+\alpha s^k$ into $f$, and taking the gradient and hoping this equals the second definition of $\alpha_k$ .

So from my understanding I have to find the derivative of $$ \frac{1}{2}(x^k +\alpha s^k)^TQ(x^k +\alpha s^k) + q^t(x^k +\alpha s^k)-\beta $$

And set it to equal $0$ and isolate for $\alpha$, can anyone give me any tips as to how to differentiate involving a Matrix?

Best Answer

\begin{align} f\big(\boldsymbol x^{k+1} \big) = f\big(\boldsymbol x^k + \alpha_k \boldsymbol s^{k}\big) &= \frac12 \big(\boldsymbol x^k + \alpha_k \boldsymbol s^{k}\big)^T Q \big(\boldsymbol x^k + \alpha_k \boldsymbol s^{k}\big) + \boldsymbol q^T \big(\boldsymbol x^k + \alpha_k \boldsymbol s^{k}\big) -\beta\\ &= \frac12 \big(\boldsymbol x^k\big)^T Q \boldsymbol x^k + \frac12 \alpha_k^2 \big(\boldsymbol s^{k}\big)^T Q \boldsymbol s^{k} + \alpha_k \big( \boldsymbol s^k \big)^T Q \boldsymbol x^{k} + \boldsymbol q^T \boldsymbol x^k + \alpha_k\boldsymbol q^T \boldsymbol s^{k} -\beta \\ &= f\big(\boldsymbol x^k \big) + \frac12 \alpha_k^2 \big(\boldsymbol s^{k}\big)^T Q \boldsymbol s^{k}+ \alpha_k \big( \boldsymbol s^k \big)^T Q \boldsymbol x^{k} + \alpha_k \big( \boldsymbol s^{k}\big)^T \boldsymbol q\\ &= f\big(\boldsymbol x^k \big) + \frac12 \alpha_k^2 \big(\boldsymbol s^{k}\big)^T Q \boldsymbol s^{k}+ \alpha_k \big( \boldsymbol s^k \big)^T \Big[Q \boldsymbol x^{k} + \boldsymbol q\Big] =: g(\alpha_k) \end{align}

Assuming a fixed $\boldsymbol x^{k}$, $g(\alpha_k) $ is essentially a univariate, scalar parabola with unique minimum (recall that this is what we want) at $g'(\alpha_k) = 0$. Performing the derivative, one obtains

\begin{align} 0 &\overset{!}{=} \alpha_k \big( \boldsymbol s^k \big)^T Q \boldsymbol s^{k}+ \big( \boldsymbol s^k \big)^T \Big[Q \boldsymbol x^{k} + \boldsymbol q\Big] \\ \Rightarrow \alpha_k &= -\frac{ \big( \boldsymbol s^k \big)^T \Big[Q \boldsymbol x^{k} + \boldsymbol q\Big]}{\big( \boldsymbol s^k \big)^T Q \boldsymbol s^{k}}\end{align}

By definition of $f\big(\boldsymbol x^k\big)$, $$\nabla f\big(\boldsymbol x^k\big) = Q \boldsymbol x^k + \boldsymbol q$$ and thus \begin{align} \alpha_k &= -\frac{ \big( \boldsymbol s^k \big)^T \nabla f\big(\boldsymbol x^k\big)}{\big( \boldsymbol s^k \big)^T Q \boldsymbol s^{k}} = -\frac{ \nabla f\big(\boldsymbol x^k\big)^T \boldsymbol s^k }{\big( \boldsymbol s^k \big)^T Q \boldsymbol s^{k}} \end{align} as desired.