[Math] Matrix Calculus and Matrix Derivatives

Consider a map $f : \mathbb R^{n\times m} \to \mathbb R^{p \times l}$ between matrix spaces, what is the differential of such a mapping? I looked at a really simple example, $\operatorname{id} : \mathbb R^{n\times n} \to \mathbb R^{n\times n}$ given by $\operatorname{id}(X) = X$. Then (in analogy to the the case $f : \mathbb \to \mathbb R$ oder $f : \mathbb R^n \to \mathbb R^n$) we should have $d\operatorname{id}(A) = I$ for all matrices $A$, where $I$ is the identity matrix (and $d\operatorname{id}$ denotes the differential, i.e. the best linear approximation map).

Now I read about matrix derivatives, for example on Wikipedia the derivate of a mapping $F : M(n,m) \to M(p,q)$ between matrix spaces is said to be:
$$
\frac{\partial\mathbf{F}} {\partial\mathbf{X}}=
\begin{bmatrix}
\frac{\partial\mathbf{F}}{\partial X_{1,1}} & \cdots & \frac{\partial \mathbf{F}}{\partial X_{n,1}}\\
\vdots & \ddots & \vdots\\
\frac{\partial\mathbf{F}}{\partial X_{1,m}} & \cdots & \frac{\partial \mathbf{F}}{\partial X_{n,m}}\\
\end{bmatrix}
$$
And also in the Matrix Cookbook the basic formula (on page 8) is written as
$$
\frac{\partial X_{kl}}{\partial X_{ij}} = \delta_{ik}\delta_{lj}
$$
(where $\delta_{ij}$ denotes the Kronecker delta) and this I guess is essentially the derivation formula for the identity map. So If I apply this on the above map $\operatorname{id} : \mathbb R^{n\times n} \to \mathbb R^{n\times n}$ I get an $4\times 4$ matrix
$$
\begin{pmatrix}
\frac{\partial X}{\partial x_{11}} & \frac{\partial X}{\partial x_{21}} \\
\frac{\partial X}{\partial x_{12}} & \frac{\partial X}{\partial x_{22}}
\end{pmatrix}
=
\begin{pmatrix}
\frac{\partial \begin{pmatrix} x_{11} & x_{12} \\ x_{21} & x_{22} \end{pmatrix}}{\partial x_{11}} &
\frac{\partial \begin{pmatrix} x_{11} & x_{12} \\ x_{21} & x_{22} \end{pmatrix}}{\partial x_{21}} \\
\frac{\partial \begin{pmatrix} x_{11} & x_{12} \\ x_{21} & x_{22} \end{pmatrix}}{\partial x_{12}} &
\frac{\partial \begin{pmatrix} x_{11} & x_{12} \\ x_{21} & x_{22} \end{pmatrix}}{\partial x_{22}}
\end{pmatrix}
= \begin{pmatrix}
1 & 0 & 0 & 0 \\
0 & 0 & 1 & 0 \\
0 & 1 & 0 & 0 \\
0 & 0 & 0 & 1 \end{pmatrix}
$$
(where I on the last line had not written out the blockmatrices). But this result is quite different from what I would intuitively expect, so what did I wrong? Maybe I am interpreting all these matrix derivatives wrong, could someone please explain?

Best Answer

The issue is that you first need to pick a basis before you write out the matrix representation of the derivative. The matrix above doesn't make sense as a derivative.

In general, I find it a little easier to avoid indices, if possible. Dealing with indices and bases can add unnecessarily clutter.

In the above case, we have $F(X) = X$, so $F(X+H) = X+H$, and so we see that $DF(X)(H) = H$.

Note that this is a map $\mathbb{R}^{n \times n} \to \mathbb{R}^{n \times n}$, so if you want to express the derivative as a matrix, you need to pick a basis first. The resulting matrix will be a $n^2 \times n^2$ matrix and it will necessarily be the identity matrix, of course, since $DF(X)(B_k) = B_k$ for the basis elements.

If one uses the inner product induced by the Frobenius norm, then one can write $\nabla F(X) = I$.

If $\phi(X) = [X]_{kl}$, a similar analysis shows that $D \phi(X)(H) = [H]_{kl}$, and to obtain the component along the $E_{ij} = e_ie_j^T$ direction, we look at $D \phi(X)(E_{ij}) = [E_{ij}]_{kl} = \delta_{ik}\delta_{jl}$.

Best Answer

Related Solutions

[Math] Numerator layout for derivatives and the chain rule

Chain Rule for Matrix Valued Derivatives with intermediate high dimensional tensors

Related Question