Differentiating a Vector and a Matrix w.r.t. a Vector [Matrix Calculus]

calculusderivativesmatrix-calculusmultivariable-calculusvectors

I am studying matrix calculus for linear regression and machine learning and I would like to know exactly if the following calculations are correct:

Let $y=\sin(x+yz)$ and $r=\begin{bmatrix}x\\y\\z\end{bmatrix}$

Then the following is the gradient of $y$ i.e., the derivative of $y$ with respect the vector $r$ in denominator layout: $$\frac{\partial y}{\partial r}=\frac{\partial\sin(x+yz)}{\partial r}=\begin{bmatrix}\frac{\partial \sin(x+yz)}{\partial x}\\\frac{\partial\sin(x+yz)}{\partial y}\\\frac{\partial \sin(x+yz}{\partial z}\end{bmatrix}=\begin{bmatrix} \cos(x+yz)\\z\cos(x+yz)\\y\cos(x+yz)\end{bmatrix}$$

In numerator layout it would be:

$$[\cos(x+yz), z\cos(x+yz), y\cos(x+yz)]$$


Now, let $$\mathbf{y}=\begin{bmatrix} e^{xyz}\\x^2z\\yx\end{bmatrix}$$

So in numerator layout: $$\frac{\partial\mathbf{y}}{\partial r}=\begin{bmatrix} \frac{\partial e^{xyz}}{\partial x} & \frac{\partial e^{xyz}}{\partial y} & \frac{\partial e^{xyz}}{\partial z}\\ \frac{\partial x^2z}{\partial x} & \frac{\partial x^2z}{\partial y} & \frac{\partial x^2z}{\partial z}\\ \frac{\partial yx}{\partial x} & \frac{\partial yx}{\partial y} & \frac{\partial yx}{\partial z}\end{bmatrix}\\ =\begin{bmatrix} yze^{xyz} & xze^{xyz} & xye^{xyz}\\2xz& 0 & x^2 \\y & x & 0\end{bmatrix} $$

In denominator layout it would be the transpose of the above matrix?


Now, let

$$Y=\begin{bmatrix} x^2yz & xy^2z \\xyz^2 & \ln(xyz) \end{bmatrix}=\begin{bmatrix} Y_{11} & Y_{12} \\Y_{21} & Y_{22} \end{bmatrix}$$

Then $$\frac{\partial Y}{\partial r}=\begin{bmatrix} \frac{\partial Y_{11}}{\partial x} & \frac{\partial Y_{12}}{\partial x} & \frac{\partial Y_{11}}{\partial y} & \frac{\partial Y_{12}}{\partial y} & \frac{\partial Y_{11}}{\partial z} & \frac{\partial Y_{12}}{\partial z} \\ \frac{\partial Y_{21}}{\partial x} & \frac{\partial Y_{22}}{\partial x} & \frac{\partial Y_{21}}{\partial y} & \frac{\partial Y_{22}}{\partial y} & \frac{\partial Y_{21}}{\partial z} & \frac{\partial Y_{22}}{\partial z}\end{bmatrix}$$

Would this be correct? I'm worrying I mixed up the shape and/or the positions of the members of the matrices $\frac{\partial\mathbf{y}}{\partial r}$ and $\frac{\partial Y}{\partial r}$.

Best Answer

As I have mentioned in the comments, the derivative of a matrix $Y$ (as a function of a vector $r$) wrt $r$ is a 3d-tensor (c.f. Derivative of a Matrix with respect to a vector). Since 3d-tensors are difficult to work with, it is convenient to use vectorization. The vectorization operator $\mathrm{vec}$ is the natural isomorphism between $\mathbb R^{n\times k}$ and $\mathbb R^{nk}$. That is, $\mathrm{vec}$ transforms a $n\times k$ matrix to $(nk)\times 1$ vector by stacking the columns of the matrix. Thus, if we are interested in the derivative of $Y$ wrt $r$, we would consider the derivative of $\mathrm{vec}(Y)$ wrt $r'$. The derivative is matrix. Thus, for the derivative of $$\mathrm{vec}(Y) = \begin{pmatrix} Y_{11} \\ Y_{21}\\ Y_{12} \\ Y_{22} \end{pmatrix} $$ wrt $$r = \begin{pmatrix} x \\ y \\ z \end{pmatrix},$$ we obtain $$\frac{\partial}{\partial r'}\mathrm{vec}(Y) = \begin{pmatrix} \frac{\partial}{\partial x}Y_{11} & \frac{\partial}{\partial y}Y_{11} & \frac{\partial}{\partial z}Y_{11} \\ \frac{\partial}{\partial x}Y_{21} & \frac{\partial}{\partial y}Y_{21} & \frac{\partial}{\partial z}Y_{21} \\ \frac{\partial}{\partial x}Y_{12} & \frac{\partial}{\partial y}Y_{12} & \frac{\partial}{\partial z}Y_{12} \\ \frac{\partial}{\partial x}Y_{22} & \frac{\partial}{\partial y}Y_{22} & \frac{\partial}{\partial z}Y_{22} \end{pmatrix}.$$

Now this results looks slightly different than your expression. Heuristically, this is because in the video the 3d-tensor is "vectorized" (actually, the third dimension is essentially stacked resulting in a 2d-tensor, i.e. a matrix) after taking the derivative. In my answer -- and I believe this is the easier and more commonly used way -- the vectorization is done before taking the derivative. Eventually, the difference boilds down to the dimension of the derivative and where are the individual partial derivatives are located. Both answers are correct in a sense that both results are isomorph (to make this explicit: apply the $\mathrm{vec}$ operator to your and my result). It is only important to be aware what approach was chosen and to stick with it in order to avoid confusion.