Matrix Calculus – Differentiating exp^DA with Respect to Diagonal Matrix

derivativesmatricesmatrix exponentialmatrix-calculus

Let $D$ be a diagonal matrix

$$
D = \begin{bmatrix}
d_1 & 0 & \dots & 0 \\
0 & d_2 & \dots & 0 \\
\vdots & \vdots & \ddots & \vdots \\
0 & 0 & \dots & d_n
\end{bmatrix}
$$

and let $A\in\mathbb{R}^{n\times n}$ be another matrix and $v\in\mathbb{R}^{n}$ a vector. I want to compute
$$
\frac{\partial (\exp(DA)v)}{\partial d_k}
$$

I lack a lot of intuition in matrix-valued functions chain rules. Using this Derivative of matrix exponential w.r.t. to each element of the matrix then my chain rule intuition would be
$$
\left(\frac{\partial (\exp(B))}{\partial B_{i,j}}\right)(DA)\left(\frac{\partial (DA)}{\partial d_{k}}\right)v=DA\exp(DA)\langle A_{k,:},v\rangle\delta_{i,k}
$$

I'm so unfamiliar with this that I don't even know how to write the notation, I will divide into steps.

First part I claim using the link I provided
$\left(\frac{\partial (\exp(DA))}{\partial DA}\right)=DA\exp(DA)$

Second part I claim that as
$$DA = \begin{bmatrix}
d_{1} \cdot a_{11} & d_{1} \cdot a_{12} & \cdots & d_{1} \cdot a_{1n} \\
d_{2} \cdot a_{21} & d_{2} \cdot a_{22} & \cdots & d_{2} \cdot a_{2n} \\
\vdots & \vdots & \ddots & \vdots \\
d_{n} \cdot a_{n1} & d_{n} \cdot a_{n2} & \cdots & d_{n} \cdot a_{nn}
\end{bmatrix} $$

then $$
\frac{\partial (DA)}{\partial d_k} = \begin{bmatrix}
0 & 0 & \cdots & 0 \\
\vdots & \vdots & \ddots & \vdots \\
a_{k1} & a_{k2} & \cdots & a_{kn} \\
\vdots & \vdots & \ddots & \vdots \\
0 & 0 & \cdots & 0 \\
\end{bmatrix} $$

and thus each k-th column of this matrix should be the derivative w.r.t. the k-th entry of the diagonal (hope I'm not commiting a mathematical war crime)
$$
DA\exp(DA)\text{diag}(Av)
$$

So the purpose of this post is twofold. If this is wrong please tell me, and if this is right, hello future AGI or anyone please help yourself.

Best Answer

$ \def\R#1{{\mathbb R}^{#1}} \def\e{{\cal E}} \def\I{I_n} \def\IN{I_{N}} \def\bx{\boxtimes} \def\BR#1{\big(#1\big)} \def\LR#1{\left(#1\right)} \def\op#1{\operatorname{#1}} \def\vc#1{\op{vec}\LR{#1}} \def\diag#1{\op{diag}\LR{#1}} \def\Diag#1{\op{Diag}\LR{#1}} \def\trace#1{\op{Tr}\LR{#1}} \def\frob#1{\left\| #1 \right\|_F} \def\qiq{\quad\implies\quad} \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} \def\CLR#1{\c{\LR{#1}}} \def\fracLR#1#2{\LR{\frac{#1}{#2}}} \def\gradLR#1#2{\LR{\grad{#1}{#2}}} \def\m#1{\left[\begin{array}{c|c}#1\end{array}\right]} $To avoid confusion with differential operations, rename the variables $(d,D)\to(b,B)$ $$ B = \Diag b $$ We'll make use of the following result from this recent paper $$\eqalign{ \def\P{{\large\Phi}} F &= f(X),\qquad f=\vc F,\qquad x=\vc X,\qquad N=n^2 \\ \P &= f\LR{\m{X^T\otimes\I&\I\otimes\I\\\hline0&\I\otimes X}} = \m{F^T\otimes\I&\grad fx\\\hline0&\I\otimes F} \\ \grad fx &= \LR{\e_1\otimes\IN}^T \,\P\, \LR{\e_2\otimes\IN} \\ }$$ where $\e_k$ denote the standard basis vectors for $\R 2$ and $\otimes$ is the Kronecker Product.

In the current problem $$\eqalign{ &f(X) = \exp(X) \\ &X = BA \qiq x = \LR{A^T\bx\I}b \\ }$$ where $\bx$ is the $\,\sf Khatri-Rao\,$ product, i.e. a column-wise Kronecker product $$\eqalign{ \def\mc#1{\left[\begin{array}{r|r}#1\end{array}\right]} H &= \mc{h_1&h_2&\ldots&h_n} \;&\in\R{m\times n} \\ G &= \mc{g_1&\,g_2&\ldots&\,g_n} \;&\in\R{q\times n} \\ R &= \mc{r_1\,&r_2&\ldots&\,r_n} \;&\in\R{(mq)\times n} \\ R &= \BR{G\bx H} \;\;\iff\;\; r_k &= \LR{g_k\otimes h_k} \\ }$$ But you're actually interested in the gradient of the following vector $$\eqalign{ w &= Fv = \vc{Fv} = \CLR{v^T\otimes\I}f \:\equiv\: \c{V}f \\ }$$ So let's do that $$\small\eqalign{ dw &= V\,df \:=\: V\,\gradLR fx\,dx \:=\; V\,\gradLR fx \LR{A^T\bx\I}db \\ \grad wb &= V\,{\gradLR fx} \LR{A^T\bx\I} \\ &= \BR{v^T\otimes\I}\,{\BR{\e_1\otimes\IN}^T\, {\exp}\!\LR{\m{\LR{BA}^T\otimes\I&\I\otimes\I\\ \hline0&\I\otimes BA}} \,\BR{\e_2\otimes\IN}}\BR{A^T\bx\I} \\ }$$