Derivative of matrix with respect to vector

derivativeslinear algebramatricesmatrix-calculuspartial derivative

I need to calculate the derivative of matrix w.r.t. vector.

< Given Equation >

1)
$\mathbb Y = \mathbb A \mathbb X$

,where
$\mathbb A$: (n$\times$n) matrix
$\mathbb X$: (n$\times$1) vector.

2)
all elements in $\mathbb A$ and $\mathbb X$ are the function of $z_i$, where
$\mathbb Z = [z_1\ z_2\ \cdots\ z_m]^\top$
In other words,
$\mathbb Y(z)=\mathbb A(z) \mathbb X(z)$

< Problem definition >
I want to calculate the following partial derivative: $\frac{\partial \mathbb Y}{\partial \mathbb Z}$, which yields a (n$\times$m) matrix
From the general derivation rule for multiplication, it looks like the rule can be expanded (with some modifications) to the matrix/vector version,

$\frac{\partial \mathbb Y}{\partial \mathbb Z}
=
\frac{\partial (\mathbb A \mathbb X)}{\partial \mathbb Z}
=
\frac{\partial \mathbb A}{\partial \mathbb Z}\mathbb X
+
\mathbb A \frac{\partial \mathbb X}{\partial \mathbb Z}$

However, the above rule is wrong, as you can easily see that the first term's dimension doesn't coincide with (n$\times$m).

I want to calculate the derivation without explicitly calculating all elements in the output $\mathbb Y$.
How can I solve this problem?

Best Answer

Your formula should be correct, when interpreted correctly.

Let's first investigate $\frac{\partial\mathbb{A}}{\partial\mathbb{Z}}$. $\mathbb{A}$ is an $n\times m$ matrix and $\mathbb{Z}$ is a vector with $m$ entries. This means, to specify a derivative, you need three coordinates: the $(i,j)$ for the entry of $\mathbb{A}$ and $k$ for the choice of variable for the derivative. Therefore, $\frac{\partial\mathbb{A}}{\partial\mathbb{Z}}$ is really a $3$-tensor, and a $3$-tensor times a vector is a matrix.

Similarly, $\frac{\partial\mathbb{X}}{\partial\mathbb{Z}}$ is a matrix because there are two coordinates, $i$ for the entry of $\mathbb{X}$ and $j$ for the choice of derivative. Hence, $\mathbb{A}\frac{\partial\mathbb{X}}{\partial\mathbb{Z}}$ is a product of matrices, and is itself a matrix.

If you want to figure out the formula a little more explicitly, if we write $\mathbb{A}=(a_{ij}(z))$ and $\mathbb{X}=(x_k(z))$, then $$ (\mathbb{Y})_i=(\mathbb{A}\mathbb{X})_i=\sum_j a_{ij}(z)x_j(z). $$ The partial derivative of this with respect to $z_k$ is $$ \frac{\partial}{\partial z_k}(\mathbb{Y})_i=\frac{\partial}{\partial z_k}\sum_j a_{ij}(z)x_j(z)=\sum_j\left(\frac{\partial}{\partial z_k}a_{ij}(z)\right)x_j(z)+\sum_ja_{ij}(z)\frac{\partial}{\partial z_k}x_j(z). $$ We can then combine all of these into a vector by dropping the $i$ to get $$ \frac{\partial\mathbb{Y}}{\partial z_k}=\frac{\partial\mathbb{A}}{\partial z_k}\mathbb{X}+\mathbb{A}\frac{\partial\mathbb{X}}{\partial z_k}. $$ This gives you the columns of the Jacobian, so they can then be put all together.

Related Question