Taking derivative with respect to a diagonal matrix

linear algebramatricesmatrix-calculus

I need help with taking the derivative of some quantities with respect to a diagonal matrix. Let say the diagonal matrix is $\boldsymbol{X}_d = \text{diag} \{ x_1, \dots, x_d \}$ (pardon my notation). I need to obtain the following derivatives
$$\frac{\partial}{\partial \boldsymbol{X}_d} \text{Tr} \{ \boldsymbol{A} \boldsymbol{X}_d \boldsymbol{B} \} \quad \text{and} \quad \frac{\partial}{\partial \boldsymbol{X}_d} \ln | \boldsymbol{A} \boldsymbol{X}_d |.$$

Initially, I naively tried the formulas for the general matrix, but after I obtain the estimation of $\boldsymbol{X}_d$, I did not get a diagonal matrix, which does not make sense, so I know there must be something special with taking the derivative with respect to a diagonal matrix. I did not find many resources on this topic, so I want to post the question and get help. Please help me if you can. Thank you so much.

Best Answer

$\def\d{{\rm diag}}\def\D{{\rm Diag}}\def\p#1#2{\frac{\partial #1}{\partial #2}}$Let's use a colon to denote the trace/Frobenius product $$\eqalign{ A:B &= {\rm Tr}(A^TB) \;=\; \sum_{j=1}^m\sum_{k=1}^n A_{jk} B_{jk} \\ A:A &= \big\|A\big\|_F^2 \\ }$$ In the case that $(A,B)$ are vectors, this definition corresponds to the standard dot product. The key idea is that the matrix/vector on each side of the colon must have the same dimensions.

The Frobenius product has many interesting properties.
In particular, for dimensionally compatible matrices $(A,B,C)$ and vector $(v)$ $$\eqalign{ AB:C &= A:CB^T \\&= B:A^TC \\&= C:AB \\ A:\D(v) &= \d(A):v \\ }$$ ${\bf NB}\!:\,$ The diag operator with an uppercase 'D' creates a diagonal matrix from a vector, while the one with the lowercase 'd' creates a vector from the diagonal of a matrix.

Write the first function in terms of this product.
Then calculate its differential and gradient. $$\eqalign{ \phi &= (BA):\D(x) \\&= \d(BA):x \\ d\phi &= \d(BA):dx \\ \p{\phi}{x} &= \d(BA) \\ \p{\phi}{X} &= \D\big(\d(BA)\big) = I\odot BA \\ }$$ where $\odot$ denotes the elementwise/Hadamard product and $I$ is the identity matrix.

For the second function, let $\,Y=AX\;$ and use Jacobi's formula $$\eqalign{ \psi &= \log(\det(Y)) \\ d\psi &= Y^{-T}:dY \\ &= (AX)^{-T}:A\,dX \\ &= A^T(A^{-T}X^{-T}):dX \\ &= X^{-1}:\D(dx) \\ &= \d(X^{-1}):dx \\ \p{\psi}{x} &= \d(X^{-1}) \\ \p{\psi}{X} &= \D\big(\d(X^{-1})\big) = I\odot X^{-1} = X^{-1} \\ }$$ since $X\,\left({\rm and}\,X^{-1}\right)$ is a already a diagonal matrix the Diag operator has no effect.