[Math] Jacobian of matrix product

linear algebramatricesmatrix-calculus

I understand from Wikipedia (https://en.wikipedia.org/wiki/Matrix_calculus#Matrix-by-scalar) that if two Matrices $M \in \mathbb{R}^{m\times k}$ and ${N \in \mathbb{R}^{k\times t}}$ whose elements are real functions over $\mathbb{R}^n$; i.e.,

$N_{ij}:\mathbb{R}^n\rightarrow \mathbb{R}$

and

$M_{ij}:\mathbb{R}^n\rightarrow \mathbb{R}$

for all $i,j$ are given, then the Jacobian of their product $MN$ is (SEE EDIT AT THE BOTTOM):

$$M\frac{\partial{N}}{\partial{\textbf{x}}}+\frac{\partial{M}}{\partial{\textbf{x}}}N\tag{1}$$

However, I have been thus far unable to demonstrate that formula. To keep notations simple, I will write my attempt in the following particular case (the solution will be similar in higher dimensions I think):

$$
A=\begin{pmatrix}
a_{11} & a_{12} \\
a_{21} & a_{22}
\end{pmatrix}
\textbf{v}
$$
where $\textbf{v}\in \mathbb{R}^2$, and $a_{ij} : \mathbb{R}^2 \rightarrow \mathbb{R}$ for all $i,j$. We seek to compute the Jacobian
$$
\frac{\partial(A\textbf{v})}{\partial(\textbf{x})}\tag{2}
$$

Let $A_i$ denote row $i$, and write

$$A\textbf{v}=\begin{pmatrix}A_1\textbf{v}\\A_2\textbf{v} \end{pmatrix}$$

Equation (2) becomes
$$
B=\begin{pmatrix}
\frac{\partial(A_1\textbf{v})}{\partial(\textbf{x}_1)} && \frac{\partial(A_1\textbf{v})}{\partial(\textbf{x}_2)} \\
\frac{\partial(A_2\textbf{v})}{\partial(\textbf{x}_1)}&& \frac{\partial(A_2\textbf{v})}{\partial(\textbf{x}_2)}
\end{pmatrix}=
\begin{pmatrix}
\frac{\partial(A_1\textbf{v})}{\textbf{x}}\\
\frac{\partial(A_2\textbf{v})}{\textbf{x}}\\
\end{pmatrix}
\tag{3}
$$
and it is easy to demonstrate that
$$
\frac{\partial(A_iv)}{\partial(\textbf{x})}=A_i\frac{\partial(v)}{\partial(\textbf{x})}+v^T\frac{\partial(A_i^T)}{\partial(\textbf{x})}.
$$
Substituting in (3) results in

$$
A\frac{\partial{\textbf{v}}}{\partial{x}}+
\begin{pmatrix}
\textbf{v}^T\frac{\partial{A_1^T}}{\partial{\textbf{x}}}\\
\textbf{v}^T\frac{\partial{A_2^T}}{\partial{\textbf{x}}}
\end{pmatrix}
$$

which differs from (1). What am I missing? Thanks in advance.

Edit:
Turns out $\frac{\partial{M}}{\partial{x}}N$ in (1) is not a usual matrix product, but a component-wise one (see the answer provided by @muaddib)

Best Answer

I think looking at the derivative of the application of a matrix on a vector isn't the right route to deriving this expression. So I'm not saying the above is wrong, but I will give you another derivation.

Since you mentioned the wiki article, I will assume we already know that if $A, B$ are matrices with coefficients that are functions $A_{ij},B_{ij} : \mathbb{R} \to \mathbb{R}$ then $$\frac{d}{dx}(AB) = (\frac{d}{dx}A)B + A (\frac{d}{dx}B)$$

Now suppose $C$ is a matrix with whose coefficients are functions $C_{ij} = \mathbb{R^n} \to \mathbb{R}$. Then I'll define the Jacobian $\frac{\partial}{\partial\textbf{x}}$ of $C$ by $$\frac{\partial}{\partial\textbf{x}} C = \sum_{i=1}^n \left(\frac{\partial}{\partial x_i} C\right) e_i$$ where $e_i$ is the $i$th standard base component of $\mathbb{R}^n$. So $\frac{\partial C}{\partial \textbf{x}}$ is a vector whose components are matrices. Now

\begin{eqnarray*} \frac{\partial}{\partial \textbf{x}}(AB) &= &\sum_{i=1}^n \left(\frac{\partial}{\partial x_i} AB \right) e_i \\ &= &\sum_{i=1}^n \left((\frac{\partial}{\partial x_i}A)B + A (\frac{\partial}{\partial x_i}B) \right) e_i \\ &= &\sum_{i=1}^n \left((\frac{\partial}{\partial x_i}A)B \right)e_i+ \sum_{i=1}^nA \left((\frac{\partial}{\partial x_i}B) \right) e_i \\ &= & (\frac{\partial}{\partial \textbf{x}}A)B + A (\frac{\partial}{\partial \textbf{x}}B) \\ \end{eqnarray*}

Note that in the final expression, the products of a jacobian of a matrix (a vector of matrices) and another matrix is performed component-wise.

Related Question