Matrix Calculus – Differentiating exp^DA with Respect to Diagonal Matrix

derivativesmatricesmatrix exponentialmatrix-calculus

Let $D$ be a diagonal matrix

D = \begin{bmatrix}
d_1 & 0 & \dots & 0 \\
0 & d_2 & \dots & 0 \\
\vdots & \vdots & \ddots & \vdots \\
0 & 0 & \dots & d_n

and let $A\in\mathbb{R}^{n\times n}$ be another matrix and $v\in\mathbb{R}^{n}$ a vector. I want to compute
\frac{\partial (\exp(DA)v)}{\partial d_k}

I lack a lot of intuition in matrix-valued functions chain rules. Using this Derivative of matrix exponential w.r.t. to each element of the matrix then my chain rule intuition would be
\left(\frac{\partial (\exp(B))}{\partial B_{i,j}}\right)(DA)\left(\frac{\partial (DA)}{\partial d_{k}}\right)v=DA\exp(DA)\langle A_{k,:},v\rangle\delta_{i,k}

I'm so unfamiliar with this that I don't even know how to write the notation, I will divide into steps.

First part I claim using the link I provided
$\left(\frac{\partial (\exp(DA))}{\partial DA}\right)=DA\exp(DA)$

Second part I claim that as
$$DA = \begin{bmatrix}
d_{1} \cdot a_{11} & d_{1} \cdot a_{12} & \cdots & d_{1} \cdot a_{1n} \\
d_{2} \cdot a_{21} & d_{2} \cdot a_{22} & \cdots & d_{2} \cdot a_{2n} \\
\vdots & \vdots & \ddots & \vdots \\
d_{n} \cdot a_{n1} & d_{n} \cdot a_{n2} & \cdots & d_{n} \cdot a_{nn}
\end{bmatrix} $$

then $$
\frac{\partial (DA)}{\partial d_k} = \begin{bmatrix}
0 & 0 & \cdots & 0 \\
\vdots & \vdots & \ddots & \vdots \\
a_{k1} & a_{k2} & \cdots & a_{kn} \\
\vdots & \vdots & \ddots & \vdots \\
0 & 0 & \cdots & 0 \\
\end{bmatrix} $$

and thus each k-th column of this matrix should be the derivative w.r.t. the k-th entry of the diagonal (hope I'm not commiting a mathematical war crime)

So the purpose of this post is twofold. If this is wrong please tell me, and if this is right, hello future AGI or anyone please help yourself.

Best Answer

$ \def\R#1{{\mathbb R}^{#1}} \def\e{{\cal E}} \def\I{I_n} \def\IN{I_{N}} \def\bx{\boxtimes} \def\BR#1{\big(#1\big)} \def\LR#1{\left(#1\right)} \def\op#1{\operatorname{#1}} \def\vc#1{\op{vec}\LR{#1}} \def\diag#1{\op{diag}\LR{#1}} \def\Diag#1{\op{Diag}\LR{#1}} \def\trace#1{\op{Tr}\LR{#1}} \def\frob#1{\left\| #1 \right\|_F} \def\qiq{\quad\implies\quad} \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} \def\CLR#1{\c{\LR{#1}}} \def\fracLR#1#2{\LR{\frac{#1}{#2}}} \def\gradLR#1#2{\LR{\grad{#1}{#2}}} \def\m#1{\left[\begin{array}{c|c}#1\end{array}\right]} $To avoid confusion with differential operations, rename the variables $(d,D)\to(b,B)$ $$ B = \Diag b $$ We'll make use of the following result from this recent paper $$\eqalign{ \def\P{{\large\Phi}} F &= f(X),\qquad f=\vc F,\qquad x=\vc X,\qquad N=n^2 \\ \P &= f\LR{\m{X^T\otimes\I&\I\otimes\I\\\hline0&\I\otimes X}} = \m{F^T\otimes\I&\grad fx\\\hline0&\I\otimes F} \\ \grad fx &= \LR{\e_1\otimes\IN}^T \,\P\, \LR{\e_2\otimes\IN} \\ }$$ where $\e_k$ denote the standard basis vectors for $\R 2$ and $\otimes$ is the Kronecker Product.

In the current problem $$\eqalign{ &f(X) = \exp(X) \\ &X = BA \qiq x = \LR{A^T\bx\I}b \\ }$$ where $\bx$ is the $\,\sf Khatri-Rao\,$ product, i.e. a column-wise Kronecker product $$\eqalign{ \def\mc#1{\left[\begin{array}{r|r}#1\end{array}\right]} H &= \mc{h_1&h_2&\ldots&h_n} \;&\in\R{m\times n} \\ G &= \mc{g_1&\,g_2&\ldots&\,g_n} \;&\in\R{q\times n} \\ R &= \mc{r_1\,&r_2&\ldots&\,r_n} \;&\in\R{(mq)\times n} \\ R &= \BR{G\bx H} \;\;\iff\;\; r_k &= \LR{g_k\otimes h_k} \\ }$$ But you're actually interested in the gradient of the following vector $$\eqalign{ w &= Fv = \vc{Fv} = \CLR{v^T\otimes\I}f \:\equiv\: \c{V}f \\ }$$ So let's do that $$\small\eqalign{ dw &= V\,df \:=\: V\,\gradLR fx\,dx \:=\; V\,\gradLR fx \LR{A^T\bx\I}db \\ \grad wb &= V\,{\gradLR fx} \LR{A^T\bx\I} \\ &= \BR{v^T\otimes\I}\,{\BR{\e_1\otimes\IN}^T\, {\exp}\!\LR{\m{\LR{BA}^T\otimes\I&\I\otimes\I\\ \hline0&\I\otimes BA}} \,\BR{\e_2\otimes\IN}}\BR{A^T\bx\I} \\ }$$