[Math] Finding a derivative of a dot product between matrices

linear algebramatricesmultivariable-calculuspartial derivative

I'm trying to work out a machine learning program that minimizes error by taking the derivative of an error function and changing matrices that represent parameters to minimize that error.

I have four matrices, $\mathbf{L_2}\in\mathbb{R}^{5\times 1}, \mathbf{L_0}\in\mathbb{R}^{3\times 1}, \mathbf{w_0}\in\mathbb{R}^{5\times 3}, \mathbf{w_1}\in\mathbb{R}^{4\times 5}$. The latter two matrices are parameter matrices, and the initial is a matrix input. I have an expression that describes the output of the algorithm(which is a matrix), $\mathbf{L_2} = \mathbf{w_1}\cdot\left(\mathbf{w_0\cdot L_0}\right)$. Rather easily, I was able to treat the matrices symbolically to find: $$\frac{\partial \mathbf{L_2}}{\partial \mathbf{w_1}} = \left(\mathbf{w_0\cdot L_0}\right)\in\mathbb{R}^{4\times 5}$$ And the output of my program regarding this calculation substantiates that this is the correct derivative. This is also confirmed by the shape of the matrix. The right derivative with respect to some matrix must yield a matrix with the same shape as that varying matrix.

However, I ran into issues calculating $\frac{\partial \mathbf{L_2}}{\partial \mathbf{w_0}}$ because, symbolically, the derivative looks like it should come out to be:

$$\frac{\partial \mathbf{L_2}}{\partial \mathbf{w_0}} = \mathbf{w_1\cdot L_0}$$ but that doesn't work at all, because the shapes of those matrices are incompatible. Trying to work it out by hand was difficult due to my limited knowledge of multivariable linear algebra calculus, so I was hoping I could find some help here.

Best Answer

I'm going to use the name $x$ for $w_0$, $y$ for $w_1$, $P$ for $L_0$ and $Q$ for $L_1$. Then the $ij$ entry of $Q$ is \begin{align} q_{ij} &= \sum_s x_{is} (y \cdot P)_{sj}\\ &= \sum_s x_{is} \sum_t y_{st} p_{tj}\\ &= \sum_{s,t} x_{is} y_{st} p_{tj}\\ \end{align} The derivative of this with respect to $x_{ab}$ involves derivatives of $y_{is}$ with respect to that (all zeroes!), derivatives of $p_{tj}$ with respect to that (all zeroes!), and derivatives of $x_{is}$ with respect to that, which are all zeroes unless $i = a$ and $s = b$. So we get \begin{align} \frac{\partial q_{ij} }{\partial x_{ab}} &= \frac{\partial}{\partial x_{ab}} \left( \sum_{s,t} x_{is} y_{st} p_{tj}\right)\\ &= \begin{cases} 0 & i \ne a \\ \sum_{t}y_{bt}p_{tb} & i = a \end{cases} \end{align}

If you apply this formula to every value $i,j$ that constitutes a legal index of $L_1$, and every $a,b$ that constitutes a legal index of $w_0$, you'll get your answer.

By the way, your statement that "The right derivative with respect to some matrix must yield a matrix with the same shape as that varying matrix." doesn't seem entirely correct to me, at least not without some very clear description of what you think a derivative is. If you have a function from $\Bbb R$ to $M(n, k)$ (the set of $n \times k$ matrices, say $ t \mapsto A(t)$, then the derivative with respect to the single variable $t$ is a matrix with the same shape as $A(t)$. If you have a function $(x, y) \mapsto B(x, y)$ from two variables to $M(n, k)$, then the derivative with respect to each is an $n \times k$ matrix, so the derivative with respect to both $x$ and $y$ should have at least two times $nk$ components.

Related Question