Chain rule: deriving matrix wrt matrix and then matrix wrt scalar

Let U = f(x) and the goal is to calculate the derivative of the function g(U) with respect to x. g(U) results in a scalar, U is a matrix and x is a scalar. The goal is to calculate the following
$$\frac{\partial g(\boldsymbol{U})}{\partial x}=\frac{\partial g(f(x))}{\partial x}$$

I already calculated and verified numerically $\frac{\partial g(\boldsymbol{U})}{\partial \boldsymbol{U}}$ and $\frac{\partial \boldsymbol{U}}{\partial x}$.

However, if I apply rule (137) from the matrix cookbook (chain rule for matrices) (link), the result is wrong.
How do I apply the chain rule in this case?

Here is my code if anyone would be interested in the verification

Edit: I am calculating the derivative of the normal disytribution with respect to a parameter $s$ that is part of his variance.

$$\frac{\partial \phi(X\beta;\boldsymbol{L}^{-1}(s)))}{\partial s}=Tr\Bigg[\Bigg(\frac{\partial \phi(X\beta;\boldsymbol{L}^{-1}(s))}{\partial \boldsymbol{L}^{-1}(s)}\Bigg)' \frac{\partial \boldsymbol{L}^{-1}(s)}{\partial s}\Bigg],$$
where $\boldsymbol{L}$ is a $p \times p$ matrix, $\boldsymbol{D}$ is a $q \times q$ matrix, $\boldsymbol{Z}$ is a $p\times q$ matrix and $\boldsymbol{X\beta}$ is a $p\times 1$ vector
\begin{eqnarray}
\boldsymbol{L}&=&I-\boldsymbol{Z}_{i}\boldsymbol{M}^{-1} \boldsymbol{Z}'_i\\
\boldsymbol{M}&=&\boldsymbol{D}^{-1}+\boldsymbol{Z}'_i\boldsymbol{Z}_i
\end{eqnarray}
$s$ is the value of the first row, first column of matrix $\boldsymbol{D}$.

I calculated $\frac{\partial \phi(\boldsymbol{X\beta};\boldsymbol{L}^{-1}(s))}{\partial \boldsymbol{L}^{-1}(s)}$ with the help of this post (link) and it equals $$\phi(X\beta;\boldsymbol{L}^{-1}(s))\Bigg[-(\boldsymbol{L} – \boldsymbol{L} \boldsymbol{X\beta} (\boldsymbol{X\beta})' \boldsymbol{L})-\text{Diag}(-0.5 (\boldsymbol{L} – \boldsymbol{L} \boldsymbol{X\beta} (\boldsymbol{X\beta})' \boldsymbol{L}))\Bigg]$$

In addition, deriving $\frac{\boldsymbol{L}^{-1}(s)}{\partial \boldsymbol{L}^{-1}(s)}$ is straightforward from repeatedly applying the rule that the partial derivative of the inverse of a matrix equals that inverse matrix multiplied with the derivative of the original matrix multiplied with the inverse matrix.
$$\frac{\partial K^{−1}}{s}=−K^{−1}(\frac{\partial K}{s})K^{−1}.$$


#prespecified matrix and vector
D=matrix(c(5,1,1,5),nrow=2,ncol=2)
zb2=matrix(rnorm(nrow(D)*2),nrow=2,ncol=nrow(D))
xb2=rnorm(2)

#f(g(x))
f=function(s){
  D[1, 1] = s
  Ds=solve(D)
  Ls=solve(diag(nrow=length(xb2))-zb2%*%solve(Ds+t(zb2)%*%(zb2))%*%t(zb2))
  ff <- dmvnorm(xb2, sigma=Ls)
  return(ff)
}

#trace(df(u)/du*du/dx)
f_grad=function(s){
  D[1, 1] = s
  Ds=solve(D)
  Ls=solve(diag(nrow=length(xb2))-zb2%*%solve(Ds+t(zb2)%*%(zb2))%*%t(zb2))
  ff <- dmvnorm(xb2,sigma= Ls)
  
  Derivs = matrix(rep(0,nrow(D)^2), nrow = nrow(D), ncol = nrow(D),byrow=T)
  Derivs[1, 1] = 1

  Ms=solve(Ds+t(zb2)%*%(zb2))
  Msacc=Ms%*%(Ds%*%Derivs%*%Ds)%*%Ms
  Lacc=-(zb2)%*%Msacc%*%t(zb2)
  Lsacc=-Ls%*%Lacc%*%Ls
  
  A <- -1/2 * (solve(Ls) - solve(Ls) %*% (xb2) %*% t(xb2) %*% solve(Ls))
  B2 <- 2 * A - diag(diag(A))
  wrtLs=B2*ff
  return(Trace(t(wrtLs)%*%Lsacc))
}

#function to calculate the error
errorf=function(x){
  return((f(x+0.0000001)-f(x-0.0000001))/0.0000002-f_grad(x))
}

errorf(15)
```

Best Answer

It is much simpler to use differentials and forget Matrix Cookbook (?). Simply write $$ dg = \frac{\partial g}{\partial \mathbf{U}} :d\mathbf{U} = \frac{\partial g}{\partial \mathbf{U}} :\frac{\partial \mathbf{U}}{\partial x} dx $$ Thus the gradient you are searching is the (Frobenius) inner product between the two matrices (denoted with the double-dot '$:$') you apparently have already computed $$ \frac{\partial g}{\partial x} = \frac{\partial g}{\partial \mathbf{U}} :\frac{\partial \mathbf{U}}{\partial x} $$ Let us know if things get in place with the provided formula.

UPDATE In the following, I use the covariance $\mathbf{\Sigma}$ of the MVN. It holds $$ d\phi = \frac{\partial \phi}{\partial \mathbf{\Sigma}} : d\mathbf{\Sigma} $$ where $$ \frac{\partial \phi}{\partial \mathbf{\Sigma}} = -\frac12 \left[ \mathbf{\Sigma}^{-1}- \mathbf{\Sigma}^{-1}\mathbf{S}\mathbf{\Sigma}^{-1} \right] $$

Using your parameterization (?) $ \mathbf{\Sigma} = \mathbf{I}-\mathbf{Z}\mathbf{M}^{-1}\mathbf{Z}^T, \mathbf{M}= \mathbf{D}^{-1}(s)+\mathbf{Z}^T \mathbf{Z} $, thus $ d\mathbf{\Sigma}= -\mathbf{A}^T(d\mathbf{D})\mathbf{A} $ where $\mathbf{A}=\mathbf{D}^{-1} \mathbf{M}^{-1} \mathbf{Z}^T$. (here I supposed that $\mathbf{D}$ was symmetric, so both $\mathbf{D}$ and $\mathbf{M}$ are symmetric matrices...)

Plugging back in $d\phi$ gives the differential $$ d\phi = -\mathbf{A}\frac{\partial \phi}{\partial \mathbf{\Sigma}}\mathbf{A}^T : d\mathbf{D} = \left( -\mathbf{A}\frac{\partial \phi}{\partial \mathbf{\Sigma}}\mathbf{A}^T : \mathbf{E}_{11} \right) ds $$ Because only the first element of $\mathbf{D}$ depends on the scalar $s$. The (scalar) derivative you search is thus the (1,1) element of the matrix $-\mathbf{A}\frac{\partial \phi}{\partial \mathbf{\Sigma}}\mathbf{A}^T$.

Best Answer

Related Solutions

Chain rule in matrix derivative

Chain rule for matrix derivatives

Related Question