[Math] How to compute the gradient of the norm for linear least squares

least squareslinear algebramatricesmatrix-calculus

I am reading about Solution of the Linear Least Squares Problem. Given the function $$f(\theta) = \frac{1}{2} \lVert y – \Phi \theta \rVert _{2}^{2}$$ To find the minimizer we need to compute the gradient of f. The text says that $$\nabla f(\theta^{*}) = 0 \leftrightarrow \Phi^{T}\Phi\theta^{*} – \Phi^{T}y = 0$$

Can someone help me and explain how to compute the gradient of f? I don't understand why $$\nabla f(\theta^{*}) = \Phi^{T}\Phi\theta^{*} – \Phi^{T}y$$

Best Answer

The squared magnitude of the vector $y-\Phi\theta$ is

$$\lVert y - \Phi \theta \rVert^2=(y-\Phi\theta)^T(y-\Phi\theta)=(y^T-\theta^T\Phi^T)(y-\Phi\theta)=y^Ty-y^T\Phi\theta-\theta^T\Phi^Ty+\theta^T\Phi^T\Phi\theta$$

But $y^T\Phi\theta$ is a scalar, so $y^T\Phi\theta=(y^T\Phi\theta)^T=\theta^T\Phi^Ty$. So the righthand side is

$$y^Ty-2\theta^T\Phi^Ty+\theta^T\Phi^T\Phi\theta$$

Differentiating with respect to $\theta$ now gives $-2\Phi^Ty+2\Phi^T\Phi\theta$, and scaling by $1/2$ gives the result.

Related Question