Partial derivative of a diagonal matrix w.r.t a vector

matricesmatrix-calculuspartial derivativetridiagonal-matrices

I am trying to find the second partial derivative of the function

$Y=diag\boldsymbol(S)\mathbb P diag \boldsymbol(\beta)diag^{-1}(\mathbb P^{T}\boldsymbol S +\mathbb P^{T}\boldsymbol E + \mathbb P^{T}\boldsymbol I +\mathbb P^{T}\boldsymbol R )\mathbb P^{T}\boldsymbol I$

where $\mathbb P=[(p_{ij})]_{n\times n},\boldsymbol S, \boldsymbol E, \boldsymbol I$ and $\boldsymbol R $ are $n\times 1$ vectors

with respect to $\boldsymbol S$. I am not sure of how to approach this. I was thinking of using the product rule whereby I will take
$$ X=\underbrace{diag\boldsymbol(S)\mathbb P diag \boldsymbol(\beta)}_{Y}\underbrace{diag^{-1}(\mathbb P^{T}\boldsymbol S +\mathbb P^{T}\boldsymbol E + \mathbb P^{T}\boldsymbol I +\mathbb P^{T}\boldsymbol R )\mathbb P^{T}\boldsymbol I}_{Z}$$ so that $$ \frac{\partial X}{\partial \boldsymbol S}=\frac{\partial Y}{\partial \boldsymbol S}Z+Y\frac{\partial Z}{\partial\boldsymbol S}$$

This is proving difficult to achieve as I am not sure if what I am thinking of doing is correct. For example, I was thinking of taking $D=diag(\mathbb P^{T}\boldsymbol S +\mathbb P^{T}\boldsymbol E + \mathbb P^{T}\boldsymbol I +\mathbb P^{T}\boldsymbol R )$ so that $$\frac{\partial Z}{\partial S}=-D\frac{\partial D}{\partial S}D^{-1}\mathbb P^{T}\boldsymbol I$$ where $$\frac{\partial D}{\partial S}=\mathbb P^{T}$$ Is this process correct? Because I am imagigning that this method (for the derivative of the inverse of a matrix) can only be applied when one is finding the derivative w.r.t a matrix and not w.r.t a vector. Could somebody help me obtain $\frac{\partial X}{\partial\boldsymbol S}$ and subsequently $\frac{\partial^{2} X}{\partial\boldsymbol S^{2}}$, $\frac{\partial^{2} X}{\partial\boldsymbol I\partial \boldsymbol E}$. I will appreciate it very much.

Best Answer

$ \def\e{\varepsilon} \def\l{\left} \def\r{\right} \def\lr#1{\l(#1\r)} \def\d#1{\operatorname{diag}\lr{#1}\,} \def\D#1{\operatorname{Diag}\lr{#1}\,} \def\v#1{\operatorname{vec}\lr{#1}\,} \def\o{{\tt1}} \def\p{{\partial}} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\hess#1#2#3{\frac{\p^2 #1}{\p #2\,\p #3^T}} \def\c#1{\color{red}{#1}} \def\E{{\cal E}} $Notation is going to be very important in answering this question. So first, let's use a convention in which uppercase letters denote matrices and lowercase letters vectors.

Let's further stipulate that an uppercase letter denotes the diagonal matrix generated by the corresponding lowercase vector, e.g. $$\eqalign{ S = \D{s}, \quad E = \D{e}, \quad etc \\ }$$ We'll also reserve $I$ to denote the identity matrix.

So let's rename your variables as follows $$({\mathbb P},\beta,S,E,I,R,X) \to (P,b,s,e,a,r,x)$$ and for typing convenience, introduce a new vector $$w = P^T(s+e+a+r)\quad\implies\quad dw = P^Tds$$ The following commutativity relationship will be key $$\D{a}b=\D{b}a$$ Write the function, then calculate the differential and thence the gradient (with respect to $s$). $$\eqalign{ x &= SPBW^{-1}P^Ta \\ dx &= dS\,PBW^{-1}P^Ta + SPB\,dW^{-1}P^Ta \\ &= dS\,PBW^{-1}P^Ta + SPB\c{\lr{-W^{-2}dW}}P^Ta \\ &= \D{ds}PBW^{-1}P^Ta - SPBW^{-2}\D{P^Tds}P^Ta \\ &= \D{PBW^{-1}P^Ta}ds - SPBW^{-2}\D{P^Ta}P^Tds \\ \grad{x}{s} &= \D{PBW^{-1}P^Ta} - SPBW^{-2}\D{P^Ta}P^T \\ }$$ This is a matrix-valued gradient, so the higher order derivatives will be tensors. I'm not sure how you want to handle that.

There are lots of options. You can use tensors, or component-wise index notation, or use Kronecker products to flatten the matrix gradient into a long vector. $$\\$$


UPDATE

The calculation of the gradient wrt $a$ is similar, and even a bit easier $$\eqalign{ x &= SPBW^{-1}P^Ta \\ dx &= SPBW^{-1}P^Tda + SPB\,dW^{-1}P^Ta \\ &= SPBW^{-1}P^Tda + SPB\c{\lr{-W^{-2}dW}}P^Ta \\ &= SPBW^{-1}P^Tda - SPBW^{-2}\D{P^Tda}P^Ta \\ &= SPBW^{-1}P^Tda - SPBW^{-2}\D{P^Ta}P^Tda \\ \grad{x}{a} &= SPBW^{-1}P^T - SPBW^{-2}\D{P^Ta}P^T \\ }$$ Now calculate the differential of $\,G=\lr{\grad{x}{a}}\,$ with respect to $e$ as the first step in the cross-hessian $$\eqalign{ dG &= SPB\,\c{ \lr{dW^{-1}} }\,P^T - SPB\,\D{P^Ta}\c{ \lr{dW^{-2}} }P^T \\ &= SPB\,\c{ \lr{-W^{-2}dW} }\,P^T - SPB\,\D{P^Ta}\c{ \lr{-2\,W^{-3}dW} }P^T \\ &= 2\,SPB\,\D{P^Ta}W^{-3}\,\D{P^Tde}\,P^T - SPBW^{-2}\,\D{P^Tde}\,P^T \\ }$$ At this point we need the vectorization function $(\v{G}\!)$ the Khatri-Rao product $(\boxtimes)$ defined in term of all-ones vectors $(\o)$ and the Kronecker $(\otimes)$ and Hadamard $(\odot)$ products, plus the obscure $\c{\rm relationship}$ $$\eqalign{ &\c{ {\rm vec}(F\,\D{g}\,H) = \left(H^T\boxtimes F\right)g } \\ &H^T\boxtimes F = (H^T\otimes{\o_m})\odot({\o_p}\otimes F) \;\in {\mathbb R}^{(mp)\times n} \\ &F \in {\mathbb R}^{m\times n},\quad g \in {\mathbb R}^{n},\quad H \in {\mathbb R}^{n\times p},\quad \o_m \in {\mathbb R}^{m} \\ }$$ This yields $$\eqalign{ dg &= \bigg( P\boxtimes\Big( 2\,SPB\,\D{P^Ta}W^{-3} - SPBW^{-2} \Big) \bigg)\,P^T\,de \\ \grad{g}{e} &= \bigg( P\boxtimes\Big( 2\,SPB\,\D{P^Ta}W^{-3} - SPBW^{-2} \Big) \bigg)\,P^T \\ }$$ So that's the vectorized version of $\lr{\hess{x}{e}{a}}$

At this point my questions are:
$\;$ Why do you need complicated tensor quantities like these?
$\;$ What will they be used for?