Derivative of matrix-diag matrix product

derivativeslinear algebramatricesmatrix-calculus

I would like to take a derivative of the following expression wrt vector $x\in\mathbb{R}^d$
$$
W\mathrm{diag}(f(Ax+b))
$$

where $f$ is some smooth element-wise function, $A\in \mathbb{R}^{K\times d}$, $b\in\mathbb{R}^K$ and $W\in\mathbb{R}^{m\times K}$. I realise the result might be a higher dimensional tensor, but I do not not how to proceed.

Edit: I tried writing the expression using the Hadamard product:
$$\left\{\mathbf{1}(f(Ax+b))^\top\right\} \circ W$$
however this is still not something I would know how to work with.

Best Answer

Let $F(x):=W\mathrm{diag}(f(Ax+b))$, then we can simply rewrite it as

$$F(x)=W\sum_{j=1}^Ke_je_j^Tf(e_j^TAx+b_j)$$

where $\{e_i\}$ is the natural basis for $\mathbb{R}^K$ and $b=[b_i]$. This is a simple rewriting of

  • the diagonal operator in terms of the matrices $e_1e_1^T,\ldots,e_Ke_K^T$ and of
  • the fact that the function $f$ acts component-wise, so we have the evaluations $f(e_1^TAx+b_1),\ldots,f(e_K^TAx+b_K)$.

The most convenient representation (at least to me) of the derivative with respect to $x$ is then

$$\dfrac{\partial F(x)}{\partial x_i}=W\sum_{j=1}^Ke_je_j^Ta_{ji}f'(e_j^TAx+b_j)$$

where $A=[a_{ij}]$, $f'$ is the derivative of $f$, and where we have applied the usual derivative rules of derivation.

This can be rewritten in terms of the initial "diag" operator as

$$\dfrac{\partial F(x)}{\partial x_i}=WD_i\mathrm{diag}(f'(Ax+b))$$

where $D_i:=\mathrm{diag}(a_{1i},\ldots,a_{Ki})$.

Related Question