Jacobian matrix dimension problem

calculusjacobianmatricesmatrix-calculus

Given matrices
$$
\begin{array}{ll}
x & \in \mathbb{R}^{n \times 1} \\
A & \in \mathbb{R}^{m \times n} \\
b & \in \mathbb{R}^{m \times 1} \\
C & \in \mathbb{R}^{k \times m} \\
d & \in \mathbb{R}^{k \times 1} \\
\end{array}
$$

and

$$
\begin{array}{ll}
z = A x + b & \in \mathbb{R}^{m \times 1} \\
y = C z + d & \in \mathbb{R}^{k \times 1} \\
\end{array}
$$

When I calculate $\frac{d y}{d x}$ I get a $\mathbb{R}^{n \times k}$ matrix,

$$
\newcommand{\d}[2]{\frac{\partial #1}{\partial #2}}
\begin{array}{ll}
\frac{d y}{d x} = \frac{d y}{d z} \frac{d z}{d x} \\
\frac{d y}{d z} = C^T & \in \mathbb{R}^{m \times k} \\
\frac{d z}{d x} = A^T & \in \mathbb{R}^{n \times m} \\
\frac{d y}{d x} = A^T C^T & \in \mathbb{R}^{n \times k}
\end{array}
$$

but if $y(x): \mathbb{R}^{n \times 1} \mapsto \mathbb{R}^{k \times 1}$, the Jacobian $J_y$ should be a $\mathbb{R}^{k \times n}$ matrix not $\mathbb{R}^{n \times k}$,

$$
\newcommand{\d}[2]{\frac{\partial #1}{\partial #2}}
\begin{array}{ll}
J_y =
\begin{bmatrix}
\d{y_1}{x_1} & \d{y_1}{x_2} & \dots & \d{y_1}{x_n} \\
\d{y_2}{x_1} & \d{y_2}{x_2} & \dots & \d{y_2}{x_n} \\
\vdots & \vdots & \ddots & \vdots \\
\d{y_k}{x_1} & \d{y_k}{x_2} & \dots & \d{y_k}{x_n} \\
\end{bmatrix} & \in \mathbb{R}^{k \times n} \\
\end{array}
$$

Can anyone explain please why is that or maybe my $\frac{d y}{d x}$ is wrong?

Best Answer

To calculate $\frac{dy}{dx}$, use the chain rule:

$$\frac{dy(z(x))}{dx} = \frac{dy(z(x))}{dz(x)}\frac{dz(x)}{d(x)}$$

Now, notice that since $z \in \mathbb{R}^m $ and $y = Cz+d$ is a function of z, $y: \mathbb{R}^m \rightarrow \mathbb{R}^k$, so $\frac{dy}{dz} \in \mathbb{R}^{k \times m}$.

Similarly, since $x \in \mathbb{R^n}$ and $z = Ax+b$ is a function of x, $z: \mathbb{R}^n \rightarrow \mathbb{R}^m$, so $\frac{dz}{dx} \in \mathbb{R}^{m \times n}$

Lastly, the Jacobian $\frac{dy}{dz}\frac{dz}{dx} \in \mathbb{R}^{k \times n}$

You can also see this by explicitly writing out the elements of the Jacobian matrix. Namely, since $y: \mathbb{R}^m \rightarrow \mathbb{R}^k$ and $z: \mathbb{R}^n \rightarrow \mathbb{R}^m$:

$$\frac{dy(z(x))}{dx} = \begin{bmatrix} \frac{dy_1(z(x))}{dx} \\ \frac{dy_2(z(x))}{dx} \\ \vdots \\ \frac{dy_k(z(x))}{dx} \end{bmatrix} = \begin{bmatrix} \frac{\partial y_1(z(x))}{\partial x_1} & \dots & \frac{\partial y_1(z(x))}{\partial x_n}\\ \vdots & \ddots & \vdots\\ \frac{\partial y_k(z(x))}{\partial x_1} & \dots & \frac{\partial y_k(z(x))}{\partial x_n} \end{bmatrix} \in \mathbb{R}^{k \times n} $$

using the chain rule on each element of the Jacobian:

$$ \frac{dy(z(x))}{dx} = \begin{bmatrix} \sum_{q=1}^{m}{\frac{\partial y_1}{\partial z_q} \frac{\partial z_q}{\partial x_1}} & \dots & \sum_{q=1}^{m}{\frac{\partial y_1}{\partial z_q} \frac{\partial z_q}{\partial x_n}}\\ \vdots & \ddots & \vdots\\ \sum_{q=1}^{m}{\frac{\partial y_k}{\partial z_q} \frac{\partial z_q}{\partial x_1}} & \dots & \sum_{q=1}^{m}{\frac{\partial y_k}{\partial z_q} \frac{\partial z_q}{\partial x_n}}\end{bmatrix}$$

Factoring out the partial derivatives:

$$\frac{dy(z(x))}{dx} = \begin{bmatrix} \frac{\partial y_1}{\partial z_1} & \dots & \frac{\partial y_1}{\partial z_m}\\ \vdots & \ddots & \vdots\\ \frac{\partial y_k}{\partial z_1} & \dots & \frac{\partial y_k}{\partial z_m} \end{bmatrix} \begin{bmatrix} \frac{\partial z_1}{\partial x_1} & \dots & \frac{\partial z_1}{\partial x_n} \\ \vdots & \ddots & \vdots\\ \frac{\partial z_m}{\partial x_1}& \dots & \frac{\partial z_m}{\partial x_n} \end{bmatrix} = \frac{dy}{dz} \frac{dz}{dx} \in \mathbb{R}^{k \times n}$$

So we know the dimensions. Let's work out each derivative:

Consider $\frac{\partial y_i}{\partial z_j}$:

$$ \frac{\partial y_i}{\partial z_j} = \frac{\partial}{\partial z_j} \sum_{p=1}^{m}{C_{ip}z_p + d_i} = \sum_{p=1}^{m}{C_{ip} \frac{\partial z_p}{\partial z_j}} = \sum_{p=1}^{m}{C_{ip}δ_{pj}} = C_{ij}$$

So compiling all the partial derivatives in the Jacobian, we get $\frac{dy}{dz}$ = C. In the same exact way, $\frac{dz}{dx} = A$ and so

$$ \frac{dy(z(x))}{dx} = \frac{dy}{dz}\frac{dz}{dx} = CA$$

which we know is in $\mathbb{R}^{k \times n}$ as required

Related Question