Show the optimal step to minimize $ax^tQx + b^tx + c$ has $\lambda = -\frac{d^t\nabla q(x)}{d^t\nabla^2q(x)d}$

calculuslinear algebramaxima-minimamultivariable-calculusoptimization

If a method of descent direction with exact linear search is used to
minimize a quadratic function $q:\mathbb{R}^n\to\mathbb{R}$, show that
the optimal step is given by $$\lambda = -\frac{d^t\nabla
q(x)}{d^t\nabla^2q(x)d}$$ where $d$ is the direction used on point $x$

So I need to minimize a function of the form $$ax^tQx + b^tx + c$$

So I need to minimize $$f(x -\lambda d) = a(x -\lambda d)^tQ(x -\lambda d) + b^t(x -\lambda d) + c$$

for that I take the derivative with respect to $\lambda$ and set it equal to $0$ to get the $\lambda$ that minimizes the expression.

First let's open the expression so we can take the derivative

$$f(x -\lambda d) = a(x -\lambda d)^t(Qx -Q\lambda d) + b^tx -b^t\lambda d + c = \\ ax^tQx-a\lambda xQd-a\lambda d^tQx+a\lambda^2d^tQd + b^tx -b^t\lambda d + c $$

so $$f' = -axQd-ad^tQx + 2a\lambda d^tQd-b^td = 0\implies\\ 2a\lambda d^tQd = b^td+axQd + aQx$$

but this expression doesn't even have the gradient, neither the hessian.

UPDATE:

I found another book asking a similar exercise:

enter image description here

This one does not have $\nabla^2$ so I guess the first is wrong? This one can be more trusted. Anyways, I still can't arrive at the answer.

Best Answer

Assuming $Q$ to be symmetric positive definite, $a>0$.

$$q(x)=ax^TQx+b^Tx+c$$

then we have

$$\nabla q(x) = 2aQx + b$$

$$\nabla^2 q(x) = 2aQ$$

Given $x$ and $d$, we want to minimizie $q(x+\lambda d)$.

\begin{align} \frac{d}{d\lambda} q(x+\lambda d) &= \nabla q(x+\lambda d)^T d \\ &=[2a(x+\lambda d)^TQ + b^T]d \end{align}

We equate it to zero,

$$[2a(x+\lambda d)^TQ + b^T]d = 0$$ $$2ax^TQd+2a\lambda d^TQd + b^Td = 0$$

$$2a\lambda d^TQd+2ax^TQd + b^Td = 0$$

$$\lambda d^T\nabla^2q(x)d + (2aQx+b)^Td=0 $$

$$\lambda d^T\nabla^2q(x)d + \nabla q(x)^Td=0 $$

$$\lambda =-\frac{ \nabla q(x)^Td }{d^T\nabla^2q(x)d}$$

The expression should be positive since $d$ should be chosen such that $\nabla q(x)^Td <0$ to be a descent direction, and the negative should make the whole expression positive. The denominator is positive by the assumption that $Q$ is positive definite.

Remark:

  • The hessian does appear in the second source that you found, note that is it equal to $Q$.
  • For the second source, $d = -\nabla f(x)$.