How Jacobian is defined for the function of a matrix

Let $f: \mathbb{R}^{m\times n} \rightarrow \mathbb{R}^m$ where $f(W)=Wx$.

Question1: How is the Jacobian matrix defined for a vector-valued function whose variable is a matrix?

Question2: Using the answer to the above, how one can generalized the Jacobian formula of a composition functions whose variables are matrices? By generalization I mean how the Jacobian of a function like $g(x)=h(u(x))$ where $x \in \mathbb{R}^n$, $u: \mathbb{R}^n \rightarrow \mathbb{R}^l$, and $h: \mathbb{R}^l \rightarrow \mathbb{R}^m$ as follows:

J_x(g) = J_u(h) J_x(u)

and $J_x(g)$ is the Jacobian of $g$ with respect to $x$ can be handle when $x$ becomes a matrix.

The above simply says the Jacobian of a composition function is the product of the Jacobians. How can one mimic this when the variable is a matrix?

My thoughts:
For the given function $f(W)=Wx$ one can write the following:


where $W_{m\bullet}$ is the m-th row of $W$. Now the Jacobian could be either
\in \mathbb{R}^{m \times n}

\in \mathbb{R}^{n \times m}.

Best Answer

As peek-a-boo commented, the concept doesn't change even whether the input is a matrix or a vector. If the notation is confusing, then I suggest having a kind of a mapping to "flatten" the matrix into a vector.

For $f: \mathbb{R}^n \to \mathbb{R}^m$, its Jacobian is $J(f)\in\mathbb{R}^{m\times n}$. So if $f: \mathbb{R}^{m\times n} \to \mathbb{R}^m$, it should follow that that $J(f) \in \mathbb{R}^{m\times m\times n}$. What was missing from your attempted solution? Well, you only derived against $W_i$ for each row $i$, while for a Jacobian you need to derive against all inputs, e.g. $W_1 x$ needs to be derived against $W_2, W_3,\dots$ as well.

For your function $f(W)=Wx$, the Jacobian is $$ J_W(f) = \frac{\partial}{\partial W}f(W)=\frac{\partial}{\partial W}Wx = I_m\otimes x $$ where $I_m$ is the identity matrix of size $m\times m$ and $\otimes$ is the Kronecker product. If you have to express this in terms of matrix notation, then I guess it will look like this

$$ J_W(f) =\left[\begin{matrix} \left[\begin{matrix}\frac{\partial}{\partial W_1}W_1\cdot x \\ \frac{\partial}{\partial W_2}W_1\cdot x \\ \dots\end{matrix}\right]\\ \left[\begin{matrix}\frac{\partial}{\partial W_1}W_2\cdot x \\ \frac{\partial}{\partial W_2}W_2\cdot x \\ \dots\end{matrix}\right] \\ \dots \end{matrix}\right] =\left[\begin{matrix} \left[\begin{matrix}x^T \\ \mathbf{0}^T \\ \dots\end{matrix}\right]\\ \left[\begin{matrix}\mathbf{0}^T \\ x^T \\ \dots\end{matrix}\right] \\ \dots \end{matrix}\right] $$ This is going to be very cumbersome to write, so it's probably easier if you express it in terms of indices instead. $$ [J_W(f)]_{ijk} = \begin{cases}x_k & (i=j)\\0 & (i\neq j)\end{cases} $$ Or perhaps with Kronecker delta $$ [J_W(f)]_{ijk} = \delta_{ij}x_k $$

Regarding your second question $g = h\circ u$, you got it close, except that the second term should $x$ as its subscript. $$ J_x(g) = J_u(h) J_x(u) $$ Just as we earlier had $J(f)\in\mathbb{R}^{m\times n}$ for $f:\mathbb{R}^n \to \mathbb{R}^m$, the dimensions for $u$ and $h$ can also be similarly be determined. Let's say $u$ is a mapping from a matrix to a matrix e.g. $g:\mathbb{R}^{m\times n\to k\times l}$. Then $J(u)\in\mathbb{R}^{k\times l\times m\times n}$, and its components written as $$ [J_X(u)]_{abcd} = \frac{\partial}{\partial X_{ab}}u_{cd}(X) $$ When writing $J(h)$ you probably would want to use a placeholder matrix $Y$ to represent the input of $h$ (which again, is assumed to be matrix to matrix mapping, but feel free to change it), so that you have $$ [J_Y(h)]_{cdpq} = \frac{\partial}{\partial Y_{cd}}h_{pq}(Y)\big|_{Y=u(X)} $$ so that we finally have $$ \begin{aligned}{} [J_X(g)]_{abpq} &= [J_Y(h)]_{cdpq}[J_X(u)]_{abcd} \\ &= \frac{\partial}{\partial Y_{cd}}h_{pq}(Y)\big|_{Y=u(X)}\frac{\partial}{\partial X_{ab}}u_{cd}(X) \end{aligned} $$