Quickly compute matrix derivatives

derivativeslinear algebramachine learningmatricespartial derivative

I have studied the mathematics behind autoencoders. In a proof, a minimization problem is rewritten several times by taking the derivative regarding matrices/ vectors.

Notation: $W_1$, $W_2$ are matrices. $b_1$, $b_2$ and $x$ are vectors.

The first problem in example is :

$$\min_{W_1, b_1, W_2, b_2} || x – (W_2(W_1x+b_1)+b_2)||^2$$

Then, it is stated that we take the partial derivatives with respect to $b_1, b_2$ and set them to $0$. This yields:
$$\min_{W_1, W_2} || x – W_2W_1x||^2$$

My question is, how can I as quick as possible compute those derivatives? My first approach was to multiply $|| x – W_2(W_1x+b_1)+b_2||^2$ out, but then I got an endless long term where I messed up the derivative.
Taking the derivative by the idea "inner derivative times outer derivative" was also not the best idea…

Has someone maybe a tip how I could proceed in such cases? Thanks a million in advance! 🙂

Best Answer

It's pretty quick to use the chain rule here. Note that $$ \frac{\partial }{\partial x} \|x\|^2 = 2 x^T. $$ With that, $$ \frac{\partial }{\partial b_1}\|x - (W_2(W_1x+b_1)+b_2)\|^2 \\= [x - (W_2(W_1x+b_1)+b_2]^T \frac{\partial }{\partial b_1} [x - (W_2(W_1x+b_1)+b_2)] \\ = 2[x - (W_2(W_1x+b_1)+b_2]^T (-W_2). $$

Related Question