function: $ f = A\cdot b$
gradient: $ \frac{\partial f}{\partial A} = b^\top \otimes \mathbb{I} $
$A$ is a matrix e.g. shape 3×3
$b$ is a vector e.g. shape 3×1
$ \otimes $ is Kronecker product
$ \mathbb{I} $ is identity matrix
My questions are:
- what is the shape of gradient: $ \frac{\partial f}{\partial A}$
- what is the definition of vector-value function with respect to matrix, since Matrix Calculus Wikipedia doesn't have this type
Thanks a lot for helping.
Best Answer
The gradient ${\cal G}$ is a third-order tensor, so its shape is $(3\times 3\times 3)$.
This easiest to see using index notation. $$\eqalign{ f_i &= A_{ij}b_j \\ df_i &= dA_{ij}b_j \\ \frac{\partial f_i}{\partial A_{mn}} &= \bigg(\frac{\partial A_{ij}}{\partial A_{mn}}\bigg)\,b_j \\ &= \big(\delta_{im}\delta_{jn}\big)\,b_j \\ &= \delta_{im}b_n \\ &= {\cal G}_{imn} \\ }$$ This assumes that the elements of $A$ are independent, so that $\Big(\frac{\partial A_{ij}}{\partial A_{mn}}\Big)$ equals zero
unless $(i=m)\,\&\,(j=n)\,-$ which are the same conditions enforced by the delta symbols.
In order to write this without resorting to tensor/index notation, many authors flatten the $(3\times 3)$ $A$ matrix into a $(9\times 1)$ vector using the Kronecker-vec relationship.
Their derivation goes like so. $$\eqalign{ f &= A\,b \\ df &= dA\,b \\ {\rm vec}(df) &= {\rm vec}(I\,dA\,b) \\ df &= (b^T\otimes I)\,da \\ \frac{\partial f}{\partial a} &= (b^T\otimes I) \;= G \in {\mathbb R}^{3\times 9} \\ }$$ Then they call the $G$ matrix "the gradient" $-$ but it's really a flattened representation of the ${\cal G}$ tensor.