[Math] Derivative of vector times its transposed wrt. itself

calculusvector analysis

I have tried to get
$$\frac{d}{d\vec{x}}\left[\vec{x}^T\vec{x}\right].$$

One approach is to use a component-wise example in 3D.
$\begin{bmatrix}x_1 & x_2 & x_3\end{bmatrix}\cdot\begin{bmatrix}x_1 \\ x_2 \\ x_3\end{bmatrix} = x_1^2 + x_2 ^2 + x_3^2$

this derived wrt the vector $\vec{x}=\begin{bmatrix}x_1 \\ x_2 \\ x_3\end{bmatrix}$ should give
$$\begin{bmatrix}\frac{\partial }{\partial x_1}(x_1^2 + x_2 ^2 +x_3^2) \\ \frac{\partial}{\partial x_2}(x_1^2 + x_2 ^2 +x_3^2)\\ \frac{\partial}{\partial x_3}(x_1^2 + x_2 ^2 +x_3^2)\end{bmatrix}=\begin{bmatrix}2x_1\\2x_2\\2x_3\end{bmatrix}$$

On the other hand, using the product rule:
$$\frac{d}{d\vec{x}}\left[\vec{x}^T\vec{x}\right] = \frac{d}{d\vec{x}}\vec{x} + \vec{x}^T \frac{d}{d\vec{x}} = \vec{x}+\vec{x}^T$$
These cannot be added together because they have different dimensionalities. So what did I do wrong? And more importantly, what is the correct derivative of $\vec{x}^T\vec{x}$?

Best Answer

The easiest way is to use the implicit /external definition of the gradient (can be obtained by the chain rule)

$$d F=dx^T\,\nabla F.$$

EDIT: Explanation of to obtain the external definition of the gradient. Consider a function $F=F(x_1,...,x_n)$ Then the total derivative is given by

$$dF = \dfrac{\partial F}{\partial x_1}dx_1+...+\dfrac{\partial F}{\partial x_n}dx_n=dx_1\dfrac{\partial F}{\partial x_1}+...+dx_n\dfrac{\partial F}{\partial x_n}$$ $$=dx^T\begin{bmatrix}\dfrac{\partial F}{\partial x_1}\\\vdots\\\dfrac{\partial F}{\partial x_n} \end{bmatrix}=dx^T\,\nabla_\text{column} F=\nabla_\text{row}F\,dx $$

What we have to do is to determine the total derivative of your expression

$$d(x^Tx)=dx^T x+x^Tdx.$$

Note, that both expressions are scalars hence we can transpose the second one to obtain the first expression:

$$d(x^Tx)=dx^T x+dx^Tx=dx^T\left[2x\right]$$

Comparing this expression with the implicit definition of the gradient we obtain

$$\dfrac{dx^Tx}{dx^T}=\nabla \left[x^Tx \right]=2x.$$


An alternative approach is to calculate the partial derivatives

$$\dfrac{\partial \sum_{j=1}^n x_j^2}{\partial x_i}=\sum_{j=1}^n\dfrac{\partial x_j^2}{\partial x_i}=2x_i$$

and then assemble the gradient as $2x$.


Or using index notation (summation over double indices)

$$\dfrac{\partial x_jx_j}{\partial x_i}=\dfrac{\partial x_j}{\partial x_i}x_j+x_j\dfrac{\partial x_j}{x_i}=\delta_{ji}x_j+x_j\delta_{ji}=x_i+x_i=2x_i.$$

The symbol $\delta_{ij}=\delta_{ji}$ is the Kronecker delta / permutation function.

Related Question