Understand the derivative of vector-value function with respect to matrix

derivativesmatrix-calculus

function: $ f = A\cdot b$

gradient: $ \frac{\partial f}{\partial A} = b^\top \otimes \mathbb{I} $

$A$ is a matrix e.g. shape 3×3

$b$ is a vector e.g. shape 3×1

$ \otimes $ is Kronecker product

$ \mathbb{I} $ is identity matrix

My questions are:

  1. what is the shape of gradient: $ \frac{\partial f}{\partial A}$
  2. what is the definition of vector-value function with respect to matrix, since Matrix Calculus Wikipedia doesn't have this type

Thanks a lot for helping.

Best Answer

The gradient ${\cal G}$ is a third-order tensor, so its shape is $(3\times 3\times 3)$.
This easiest to see using index notation. $$\eqalign{ f_i &= A_{ij}b_j \\ df_i &= dA_{ij}b_j \\ \frac{\partial f_i}{\partial A_{mn}} &= \bigg(\frac{\partial A_{ij}}{\partial A_{mn}}\bigg)\,b_j \\ &= \big(\delta_{im}\delta_{jn}\big)\,b_j \\ &= \delta_{im}b_n \\ &= {\cal G}_{imn} \\ }$$ This assumes that the elements of $A$ are independent, so that $\Big(\frac{\partial A_{ij}}{\partial A_{mn}}\Big)$ equals zero
unless $(i=m)\,\&\,(j=n)\,-$ which are the same conditions enforced by the delta symbols.

In order to write this without resorting to tensor/index notation, many authors flatten the $(3\times 3)$ $A$ matrix into a $(9\times 1)$ vector using the Kronecker-vec relationship.

Their derivation goes like so. $$\eqalign{ f &= A\,b \\ df &= dA\,b \\ {\rm vec}(df) &= {\rm vec}(I\,dA\,b) \\ df &= (b^T\otimes I)\,da \\ \frac{\partial f}{\partial a} &= (b^T\otimes I) \;= G \in {\mathbb R}^{3\times 9} \\ }$$ Then they call the $G$ matrix "the gradient" $-$ but it's really a flattened representation of the ${\cal G}$ tensor.