Closed form matrix derivative of $\operatorname{tr}(A\exp(X))$

derivativesmatrix exponentialmatrix-calculusmultivariable-calculusscalar-fields

I am interested in finding a closed form expression for the following matrix derivative:

$\frac{\partial}{\partial \mathbf{X}}\operatorname{Tr}\left(\mathbf{A}\exp(\mathbf{X})\right)$

Where we assume that $\mathbf{A}$ and $\mathbf{X}$ are both symmetric $n\times n$ matrices. By applying the following identity:

$\frac{\partial}{\partial \mathbf{X}} \operatorname{Tr}\left(\mathbf{A} \mathbf{X}^{k}\right)=\sum_{r=0}^{k-1}\left(\mathbf{X}^{r} \mathbf{A} \mathbf{X}^{k-r-1}\right)^{T}$(found in https://www.math.uwaterloo.ca/~hwolkowi/matrixcookbook.pdf)

I figured one can use the definition of a matrix exponential to obtain the following expression:

$\frac{\partial}{\partial \mathbf{X}}\operatorname{Tr}\left(\mathbf{A}\exp(\mathbf{X})\right)=\sum_{k=0}^\infty\frac{\partial}{\partial \mathbf{X}} \operatorname{Tr}\left(\frac{1}{k!}\mathbf{A}\mathbf{X}^{k}\right)=\sum_{k=1}^{\infty}\sum_{r=0}^{k-1}\left(\mathbf{X}^{r} \mathbf{A} \mathbf{X}^{k-r-1}\right)^{T}$

Where we skip over $k=0$ since $\frac{\partial}{\partial \mathbf{X}}\operatorname{Tr}\left(\mathbf{A})\right)=\mathbf{O}$.

I am not sure if this expression can be reduced to something which is in closed form. If you are able to suggest a method with which this might be possible, or a different method entirely, I would be helped greatly.

Thanks in advance for your suggestions and help.

Best Answer

$ \def\LR#1{\left(#1\right)} \def\BR#1{\Big(#1\Big)} \def\bR#1{\big(#1\big)} \def\op#1{\operatorname{#1}} \def\Diag#1{\op{Diag}\LR{#1}} \def\trace#1{\op{Tr}\LR{#1}} \def\qiq{\quad\implies\quad} \def\b{\beta} \def\o{{\tt1}} \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} $Since $X$ is a real symmetric matrix, it can be diagonalized as follows $$\eqalign{ X &= QBQ^T, \qquad Q^TQ=I,\quad B=\Diag{\b_k},\\ }$$ Given a real differentiable function (and its derivative) $$f(z), \qquad f'(z) = \frac{df}{dz}$$ when this function is applied to a matrix argument, the $\sf Daleckii\:Krein\,$ theorem tells us $$\eqalign{ F &= f(X) \\ dF &= Q\BR{R\odot\LR{Q^TdX\:Q}}Q^T \\ R_{jk} &= \begin{cases} {\large\frac{f(\b_j)-f(\b_k)}{\b_j-\b_k}} \quad\quad {\rm if}\;\b_j\ne\b_k \\ \\ \quad f'(\b_k) \qquad\quad {\rm otherwise} \\ \end{cases} \\ }$$ where ${\odot}$ denotes the elementwise/Hadamard product.

In this particular problem, we conveniently have $${f'(z) = f(z) = \exp(z)}$$ Let's also introduce the Frobenius product, which is a handy notation for the trace $$\eqalign{ A:B &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; \trace{A^TB} \\ A:A &= \|A\|^2_F \qquad \{ {\rm Frobenius\;norm} \} \\ }$$ This is also called the double-dot or double contraction product.
When applied to vectors $(n=\o)$ it reduces to the standard dot product.

The properties of the underlying trace function allow the terms in a Frobenius product to be rearranged in many useful ways, e.g. $$\eqalign{ A:B &= B:A \\ A:B &= A^T:B^T \\ C:\LR{AB} &= \LR{CB^T}:A &= \LR{A^TC}:B \\ }$$

The Frobenius and Hadamard products commute in the following sense $$\eqalign{ A:\LR{B\odot C} = \LR{A\odot B}:C \;=\; \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij}C_{ij} \\\\ }$$


Use the above notation to rewrite the objective function and calculate its gradient. $$\eqalign{ \phi &= A:F \\ d\phi &= A:dF \\ &= A : Q\BR{R\odot\LR{Q^TdX\:Q}}Q^T \\ &= Q\BR{R\odot\LR{Q^TAQ}}Q^T : dX \\ \grad{\phi}{X} &= Q\BR{R\odot\LR{Q^TAQ}}Q^T \\ }$$

Related Question