Confusion about matrix derivative/chain rule

chain rulederivativesmatricesmatrix-calculus

I have a vector Z of which depends on time. I am looking to find $\ddot\sigma(Z)$, the second time derivative of $\sigma$.

$$Z = \begin{bmatrix} z_1\\ z_2\\ z_3\\ \end{bmatrix}$$

And $\sigma$ is defined as follows:

$$ \sigma(z_k) = \frac{\mathrm{1} }{\mathrm{1} + e^{-z_k} } $$

$$\sigma(Z) = \begin{bmatrix} \sigma(z_1)\\ \sigma(z_2)\\ \sigma(z_3)\\ \end{bmatrix} $$

It is my understanding that $\dot\sigma(Z)$ should be equal to $\frac{\partial{\sigma}}{\partial{Z}}\cdot\frac{\partial{Z}}{\partial{t}}$ Which in our case evaluates to the Jacobian of $\sigma(Z)$ times the time derivative of Z:

$$\dot\sigma(Z) = \begin{bmatrix} \frac{\partial\sigma(z_1)}{\partial{z_1}}& 0& 0\\ 0& \frac{\partial\sigma(z_2)}{\partial{z_2}}& 0 \\ 0&0&\frac{\partial\sigma(z_3)}{\partial{z_3}}\\ \end{bmatrix} \begin{bmatrix} \dot z_1\\ \dot z_2\\ \dot z_3\\ \end{bmatrix}$$

If the above is right, my question is then what would $\ddot\sigma(Z)$ be? I attempt to use the product rule, but the dimensions don't work out properly (and I'm not really sure if I'm using the "chain rule" properly here)

$$\ddot\sigma(Z) = \begin{bmatrix} \frac{\partial\sigma(z_1)}{\partial{z_1}}& 0& 0\\ 0& \frac{\partial\sigma(z_2)}{\partial{z_2}}& 0 \\ 0&0&\frac{\partial\sigma(z_3)}{\partial{z_3}}\\ \end{bmatrix} \begin{bmatrix} \ddot z_1\\ \ddot z_2\\ \ddot z_3\\ \end{bmatrix} +\begin{bmatrix} \frac{\partial^2\sigma(z_1)}{\partial{z_1}^2}& 0& 0\\ 0& \frac{\partial^2\sigma(z_2)}{\partial{z_2}^2}& 0 \\ 0&0&\frac{\partial^2\sigma(z_3)}{\partial{z_3}^2}\\ \end{bmatrix}\begin{bmatrix} \dot z_1\\ \dot z_2\\ \dot z_3\\ \end{bmatrix} \begin{bmatrix} \dot z_1\\ \dot z_2\\ \dot z_3\\ \end{bmatrix} $$

Best Answer

For a scalar argument, the derivatives/differentials of the function are $$\eqalign{ \sigma'(z_k) &= \frac{d\sigma}{dz_k}\quad\implies\quad d\sigma = \sigma'dz_k \\ \sigma''(z_k) &= \frac{d\sigma'}{dz_k}\quad\implies\quad d\sigma' = \sigma''dz_k \\ }$$ When applied element-wise to vectors, these evaluate to vectors $$s = \sigma(z),\quad s'=\sigma'(z),\quad s''=\sigma''(z)$$ and the differentials must be written using the element-wise (aka Hadamard) product $$ds = s'\odot dz,\qquad ds' = s''\odot dz$$ Since the time derivatives/differentials of $z$ are related by $$dz = \dot z dt,\qquad d\dot z = \ddot z dt$$ the time derivatives of $s$ can be calculated as $$\eqalign{ ds &= s'\odot\dot z \, dt \\ \dot s &= s'\odot\dot z \\\\ d\dot s &= ds'\odot\dot z + s'\odot d\dot z \\ &= (s''\odot dz)\odot\dot z + s'\odot(\ddot z dt) \\ &= (s''\odot\dot z\odot\dot z + s'\odot\ddot z)\,dt \\ \ddot s &= s''\odot\dot z\odot\dot z + s'\odot\ddot z \\ }$$ One final trick, is that Hadamard multiplication by a vector can be replaced by first converting the vector into a diagonal matrix and then performing normal matrix multiplication, i.e. $$\eqalign{ a\odot b &= {\rm Diag}(a)\,b \;=\; Ab \\ {\rm Diag}(a\odot b) &= AB = BA \quad \big({\rm diagonal\,matrices\,commute}\big) \\ }$$ So these results can be written using diagonal matrices as $$\eqalign{ \dot S &= S'\dot Z \\ \ddot S &= S''\dot Z^2 + S'\ddot Z \\ }$$ or as a mixture of matrices and vectors $$\eqalign{ \dot s &= \dot Zs' \\ \ddot s &= \dot Z^2s'' + \ddot Zs' \\ }$$ NB: In the case of the Logistic function the derivatives are given by simple formulas $$\eqalign{ \sigma' &= (\sigma-\sigma^2) &\implies S'=(S-S^2) &\implies s'=(I-S)s \\ \sigma'' &= (\sigma-3\sigma^2+2\sigma^3)\; &\implies S''=(S-3S^2+2S^3)\; &\implies s''=(I-3S+2S^2)s \\ }$$

Related Question