Interpretation of gradient mapping

calculuslinear algebra

I have an objective function

$$f(x)=\|Ax-b\|^2, A \in \mathbb{R}^{3 \times 3}, x \in \mathbb{R}^3, b \in \mathbb{R}^3$$

I want to use gradient mapping to minimize this function. To do so, I know that

$$x_Q(\bar{x},\gamma)=arg min[f(\bar{x}+\langle f'(\bar{x}),x-\bar{x}\rangle+\frac{\gamma}{2}\|x-\bar{x}\|^2]$$
$$g_Q(\bar{x},\gamma)=\gamma(\bar{x}-x_Q(\bar{x},\gamma)$$

My questions is about the $f'(\bar{x})$ term. When finding this term, I let $$A=\begin{bmatrix}A_{11}&A_{12}&A_{13}\\A_{21}&A_{22}&A_{23}\\A_{31}&A_{32}&A_{33}\end{bmatrix}, x=\begin{bmatrix}x_1\\x_2\\x_3\end{bmatrix}, b=\begin{bmatrix}b_1\\b_2\\b_3\end{bmatrix}$$

From here, I know that $$f(x)=\|\begin{bmatrix}A_{11}&A_{12}&A_{13}\\A_{21}&A_{22}&A_{23}\\A_{31}&A_{32}&A_{33}\end{bmatrix}\begin{bmatrix}x_1\\x_2\\x_3\end{bmatrix}-\begin{bmatrix}b_1\\b_2\\b_3\end{bmatrix}\|^2 = \|\begin{bmatrix}A_{11}x_1+A_{12}x_2+A_{13}x_3-b_1\\A_{21}x_1+A_{22}x_2+A_{23}x_3-b_2\\A_{31}x_1+A_{32}x_2+A_{33}x_3-b_3\end{bmatrix}\|^2$$

Breaking apart the magnitude:
$$\sqrt{(A_{11}x_1+A_{12}x_2+A_{13}x_3-b_1)^2+(A_{21}x_1+A_{22}x_2+A_{23}x_3-b_2)^2+(A_{31}x_1+A_{32}x_2+A_{33}x_3-b_3)^2}^2=(A_{11}x_1+A_{12}x_2+A_{13}x_3-b_1)^2+(A_{21}x_1+A_{22}x_2+A_{23}x_3-b_2)^2+(A_{31}x_1+A_{32}x_2+A_{33}x_3-b_3)^2$$

Then, calculating the partial derivatives $$\frac{\partial f}{\partial x_1}=2A_{11}(A_{11}x_1+A_{12}x_2+A_{13}x_3-b_1)+2A_{21}(A_{21}x_1+A_{22}x_2+A_{23}x_3-b_2)+2A_{31}(A_{31}x_1+A_{32}x_2+A_{33}x_3-b_3)$$
$$\frac{\partial f}{\partial x_2}=2A_{12}(A_{11}x_1+A_{12}x_2+A_{13}x_3-b_1)+2A_{22}(A_{21}x_1+A_{22}x_2+A_{23}x_3-b_2)+2A_{32}(A_{31}x_1+A_{32}x_2+A_{33}x_3-b_3)$$
$$\frac{\partial f}{\partial x_3}=2A_{13}(A_{11}x_1+A_{12}x_2+A_{13}x_3-b_1)+2A_{23}(A_{21}x_1+A_{22}x_2+A_{23}x_3-b_2)+2A_{33}(A_{31}x_1+A_{32}x_2+A_{33}x_3-b_3)$$

Combining the partials, I arrive at
$$f'(x)=\begin{bmatrix}2A_{11}(A_{11}x_1+A_{12}x_2+A_{13}x_3-b_1)+2A_{21}(A_{21}x_1+A_{22}x_2+A_{23}x_3-b_2)+2A_{31}(A_{31}x_1+A_{32}x_2+A_{33}x_3-b_3)\\2A_{12}(A_{11}x_1+A_{12}x_2+A_{13}x_3-b_1)+2A_{22}(A_{21}x_1+A_{22}x_2+A_{23}x_3-b_2)+2A_{32}(A_{31}x_1+A_{32}x_2+A_{33}x_3-b_3)\\2A_{13}(A_{11}x_1+A_{12}x_2+A_{13}x_3-b_1)+2A_{23}(A_{21}x_1+A_{22}x_2+A_{23}x_3-b_2)+2A_{33}(A_{31}x_1+A_{32}x_2+A_{33}x_3-b_3)\end{bmatrix}$$
Can anyone verify if this is correct?

Best Answer

Your expression for $f'(x)$ is correct, and it can be written concisely in matrix notation as $\nabla f(x) = 2A^T(Ax-b)$.

The formula can also be derived easily using the multivariable chain rule. Note that $f(x) =g(h(x))$, where $h(x) = Ax-b$ and $g(u) = \|u\|^2$. The derivatives of $g$ and $h$ are $g'(u) = 2u^T$ (a row vector) and $h'(x) = A$. By the chain rule, the derivative of $f$ is \begin{align} f'(x) &= g'(h(x)) h'(x) \\ &= 2(Ax-b)^T A. \end{align} My notation differs slightly from yours because in my notation $f'(x)$ is a row vector. If we use the convention that the gradient is a column vector, then $$ \nabla f(x) = f'(x)^T = 2 A^T(Ax-b). $$

Related Question