Derivative of a diagonal matrix with respect to a vector whose entries appear in the matrix

linear algebramatricesmatrix-calculus

I have the following diagonal matrix
$$
\underset{d \times d}{\boldsymbol{W}} = \begin{bmatrix}
\left( y_1 +\dfrac{1 – y_1}{x_1} \right)^{-1} & 0 & \dots & 0 \\
\vdots & \vdots & \ddots & \vdots \\
0 & 0 & \dots & \left( y_d +\dfrac{1 – y_d}{x_d} \right)^{-1}
\end{bmatrix}
$$

and I need to take the derivative of $\boldsymbol{W}$, $\ln \lvert \boldsymbol{W} \boldsymbol{A} \rvert$, and $\boldsymbol{B} \boldsymbol{W}^{-1} \boldsymbol{C}$ with respect to vector $\boldsymbol{x} = (x_1, \dots, x_d)^\top$, where $\boldsymbol{A}, \boldsymbol{B}$, and $\boldsymbol{C}$ are all $d \times d$ matrices. I don't know if it is feasible to do so. Intuitively, if I just take the derivative of a single diagonal entry of $\boldsymbol{W}$, say $\left( y_1 +\dfrac{1 – y_1}{x_1} \right)^{-1}$, with the corresponding $x_1$, I will get
$$-\left( y_1 +\dfrac{1 – y_1}{x_1} \right)^{-2} \left( -\dfrac{1 – y_1}{x^2_1} \right),$$
which I hardly relate a matrix to such a result. Or maybe my linear algebra and matrix calculus skill is not there yet. If anyone has a clue, please help me. Thank you so much.

Best Answer

$\def\c#1{\color{red}{#1}}\def\v{{\rm vec}}\def\d{{\rm diag}}\def\D{{\rm Diag}}\def\o{{\tt1}}\def\p#1#2{\frac{\partial #1}{\partial #2}}$Define the diagonal matrices $$\eqalign{ X &= \D(x), \qquad Y &= \D(y) \\ }$$ Then the $W$ matrix is defined such that $$\eqalign{ W^{-1} &= Y + X^{-1}(I-Y) \\ }$$ and its differential can be calculated as $$\eqalign{ dW^{-1} &= dX^{-1}(I-Y) \\ W^{-1}dW\,W^{-1} &= X^{-1}dX\,X^{-1}(I-Y) \\ dW &= WX^{-1}dX\,\c{X^{-1}(I-Y)}W \\ &= WX^{-1}dX\,\c{(W^{-1}-Y)}W \\ &= WX^{-1}dX\,(I-YW) \\ }$$ Finally, set $\,M=WA\;$ and use Jacobi's formula for your middle function. $$\eqalign{ \lambda &= \log(\det(M)) \\ d\lambda &= M^{-T}:dM \\ &= (WA)^{-T}:dW\,A \\ &= W^{-1}A^{-T}A^T:dW \\ &= W^{-1}:dW \\ &= W^{-1}:WX^{-1}dX\,{(I-YW)} \\ &= X^{-1}(I-YW):dX \\ &= \d\Big(\big(I-YW\big)X^{-1}\Big):dx \\ \p{\lambda}{x} &= \d\Big(\big(I-YW\big)X^{-1}\Big) \;=\; \big(\o-y\odot w\big)\oslash x \\ }$$ That was the simple function. The problem with the remaining functions is that a matrix-by-vector gradient is a third-order tensor, which cannot be represented in standard matrix notation.

However, since $W$ is diagonal we can calculate the gradient of $w=\d(W)$ with respect to $x$. $$\eqalign{ dw &= \d(dW) \\ &= \d\Big(WX^{-1}dX\,(I-YW)\Big) \\ &= \left(WX^{-1}(I-YW)\right)dx \\ &= \big(W-YW^2\big)X^{-1}\,dx \\ \p{w}{x} &= \big(W-YW^2\big)X^{-1} \\\\ }$$ To tackle the final function note that $$\eqalign{ Q &= W^{-1} \\ dQ &= -X^{-1}dX\,X^{-1}(I-Y) \\ dq &= \d(dQ) \;=\; (Y-I)X^{-2}\,dx \\ }$$ Then vectorize the matrix using the Khatri-Rao product. $$\eqalign{ P &= BW^{-1}C \\&= BQC \\ p &= \v(P) \\ &= \left(C^T\boxtimes B\right) q \\ dp &= \left(C^T\boxtimes B\right)\,dq \\ &= \left(C^T\boxtimes B\right)(Y-I)X^{-2}\,dx \\ \p{p}{x} &= \left(C^T\boxtimes B\right)(Y-I)X^{-2} \\ }$$


In the steps above, $\odot$ denotes the elementwise/Hadamard product, $\oslash$ denotes Hadamard division, and a colon denotes the trace/Frobenius product, i.e. $$\eqalign{ A:B &= {\rm Tr}(AB^T) \;=\; \sum_{i=1}^m \sum_{j=1}^n A_{ij} B_{ij} \\ A:A &= \big\|A\big\|^2_F \\ }$$ The properties of the underlying trace function permit the terms in such a product to be rearranged in several equivalent ways $$\eqalign{ A:B &= B:A = B^T:A^T \\ CA:B &= C:BA^T = A:C^TB \\ }$$ Note that the matrix on each side of the colon must have the same dimensions.

The Khatri-Rao product can be defined as $$\eqalign{ C^T\boxtimes B &= (C^T\otimes\o)\odot(\o\otimes B) \;\in\; {\mathbb R}^{n^2\times n} \\ }$$ where $\otimes$ denotes the Kronecker product and $\,\o=\d(I)\,$ is the all-ones vector.