Calculus – How to Define the Derivative of a Vector with Respect to a Matrix

calculusmatrix-calculus

Given the equation $$ Ax = b $$ where $A$ is a matrix, and $b$ and $x$ are vectors, I'm trying to make sense of $ \frac{\partial b}{\partial A} $. By "opening up" the matrix equation into several 1D equations, I managed to get to $$ \frac{\partial b_{i}}{\partial A_{jk}}=\begin{cases}
x_{j} & \text{ if } i=j \\
0 & \text{ if } i \neq j
\end{cases} $$

I was wondering: how can I define this partial derivative for all terms without resorting to "opening up" the matrix equation into several 1D equations? Also, this definition of the partial derivative of a vector with relation to a matrix should hopefully abide by something that resembles the differentiation rules that I'm used to in 1D.

The wikipedia article on matrix calculus doesn't define this partial derivative (as far as I understood, the furthest they go is vector wrt to vector) and this question is the closest I could find in math.se, but the answer provided isn't really helpful for what I'm trying to achieve.

Any help would be appreciated.

Best Answer

Let us write $[a_{ij}:i=1,...,m;\,j=1,...,n]$ to mean the $m\times n$ matrix which has $a_{ij}$ in the $i$th row of the $j$th column. When $A_{ij}\,$ ($i=1,...,m;\,j=1,...,n$) is a $p\times q$ matrix, then we accordingly write $[A_{ij}:i=1,...,m;\,j=1,...,n]$ for the $m\times n$ block matrix whose $(i,j)$th (block) entry is $A_{ij}$; this can also be interpreted as an $mp\times nq$ matrix with scalar entries. We treat a (column) $n$-vector $\pmb x=(x_1,...,x_n)$ as an $n\times 1$ matrix, and the transpose of such a vector is a $1\times n$ (row) matrix $\pmb x\!^\top=[x_1,...,x_n]$.

The notation works best if the derivative with respect to a vector variable $\pmb x=(x_1,...,x_n)$ is taken to be a row matrix:$$\frac{\partial y}{\partial\pmb x}=\left[\frac{\partial y}{\partial x_1},...,\frac{\partial y}{\partial x_n}\right].$$The derivative with respect to the corresponding row matrix $\pmb x\!^\top$ is defined dually as the (column) vector $$\frac{\partial y}{\partial\pmb x^{\!\top}}=\left(\frac{\partial y}{\partial x_1},...,\frac{\partial y}{\partial x_n}\right).$$Accordingly, the derivative with respect to an $m\times n$ matrix $X=[\pmb x_1,...,\pmb x_n]$, where $\pmb x_i$ is the $m$-vector $(x_{i1},...,x_{im})$, is$$\frac{\partial y}{\partial X}=\left(\frac{\partial y}{\partial\pmb x_1},...,\frac{\partial y}{\partial\pmb x_n}\right),$$namely the $n\times m$ matrix$$\frac{\partial y}{\partial X}=\left[\frac{\partial y}{\partial x_{ji}}:i=1,...,n;\,j=1,...,m\right].$$Finally, the corresponding matrix derivative of a $p\times q$ matrix $Y$ is$$\frac{\partial Y}{\partial X}=\left[\frac{\partial Y}{\partial x_{ji}}:i=1,...,n;\,j=1,...,m\right],$$which is an $n\times m$ ($p\times q$)-block matrix (i.e. an $np\times mq$ matrix).

In the case of your question, $A= [a_{ij}:i=1,...,m;\,j=1,...,n]$, $\pmb x=(x_1,...,x_n)$, $Y=A\pmb x$, and we are treating $A=X$ as the derivative variable, with $\pmb x$ constant. If you plug this in to the above expression for the matrix derivative, you end up with the $mn\times m$ matrix$$\frac{\partial(A\pmb x)}{\partial A}=\pmb x\otimes\mathrm I_m,$$where $\mathrm I_m$ is the $m\times m$ identity matrix. See the Wikipedia articles for Kronecker (tensor) product and matrix calculus.