Differentiate a matrix equation w.r.t a vector

calculusderivativeslinear algebramatrices

I have a matrix equation that yields a scalar

$$f(M) = MAM^T – 2 \sum_i^{N} \log(M_i)$$

Where $M$ is a $1 \times N$ row vector, and $A$ is an $N \times N$ matrix. As such, the result $f$ is a scalar.

How does one take the derivative of $f$ w.r.t $M$? I have seen matrix cookbook define derivatives of matrices w.r.t specific index values, but I couldn't find a definition of differentiation w.r.t vectors.

My intuition is something like

$$\frac{\partial f}{\partial M} = 2(AM – M^{\circ -1})$$

with my reasoning being the two $M$'s in the first term yield $2AM$ once differentiated, and the log term yields $M$ where each element is raised to the $-1$ power ($\frac{d \log x}{dx} = x^{-1}$).

Again, I'm not sure if I've done this correctly, and the fact that $f$ is a scalar makes it a bit more confusing.

Best Answer

We use the linearity of differentiation and consider at first \begin{align*} g(M)=MAM^T\tag{1} \end{align*} with $M=(M_i)_{1\leq i\leq N}$ an $(1\times N)$-matrix.

We obtain \begin{align*} dg(M)&=dMAM^T+MAdM^T\tag{2}\\ \mathrm{vec}(dg(M))&=\mathrm{vec}(dMAM^T)+\mathrm{vec}(MAdM^T)\tag{3}\\ &=\left(MA^T\otimes I_1\right)\mathrm{vec}(dM)+\left(I_1\otimes MA\right)\mathrm{vec}\left(dM^T\right)\tag{4}\\ &=MA^T\mathrm{vec}(dM)+MA I_n\mathrm{vec}(dM)\tag{5}\\ &=\left(MA^T+MA\right)\mathrm{vec}(dM)\\ \color{blue}{\frac{\partial g(M)}{dM}}&=\frac{\partial \mathrm{vec}(dg(M))}{\mathrm{vec}(dM)}=\color{blue}{M\left(A^T+A\right)}\tag{6} \end{align*}

Comment:

  • In (2) we start by calculating the differential.

  • In (3) we vectorize the equation.

  • In (4) we use the relationship with Kronecker products to factor out $\mathrm{vec}(dM)$ resp. $\mathrm{vec}(dM^T)$.

  • In (5) we do a simplification and use $\mathrm{vec}(dM^T)=C\mathrm{vec}(dM)$ by noting the commutation matrix $C=I_n$.

  • In (6) we take the gradient.

We can check the result (6) by setting

\begin{align*} g(M)&=MAM^T\\ &=\left(M_i\right)_{1\leq i\leq N}\left(A_{ij}\right)_{1\leq i,j\leq N}\left(M_i\right)^T_{1\leq i\leq N}\\ &=\left(\sum_{j=1}^N M_jA_{ij}\right)_{1\leq i\leq N}\left(M_i\right)^T_{1\leq i\leq N}\\ &=\sum_{i=1}^N\sum_{j=1}^N M_iM_jA_{ij} \end{align*}

We obtain

\begin{align*} \color{blue}{\frac{\partial g(M)}{\partial M}}&=\frac{\partial}{\partial\left(M_1,\ldots,M_N\right)}\left(\sum_{i=1}^N\sum_{j=1}^NM_iM_jA_{ij}\right)\\ &=\left(\frac{\partial}{\partial M_k}\sum_{i=1}^N\sum_{j=1}^N M_iM_jA_{ij}\right)_{1\leq k\leq N}\\ &=\left(\sum_{{j=1}\atop{j\ne k}}^N M_jA_{kj}+\sum_{{i=1}\atop{i\ne k}}^NM_iA_{ik}+2M_kA_{kk}\right)_{1\leq k\leq N}\\ &\,\,\color{blue}{=\left(\sum_{j=1}^NM_j\left(A_{kj}+A_{jk}\right)\right)_{1\leq k\leq N}} \end{align*}

in accordance with (6).

Finally considering $f$ we obtain using (6) \begin{align*} \frac{\partial f(M)}{\partial M}&=M\left(A^T+A\right)-2\frac{\partial}{\partial (M_1,\ldots,M_N)}\sum_{i=1}^N\log(M_i)\\ &=M\left(A^T+A\right)-2\left(\frac{\partial}{\partial M_k}\sum_{i=1}^N\log(M_i)\right)_{1\leq k\leq N}\\ &\,\,\color{blue}{=M\left(A^T+A\right)-2\left(\frac{1}{M_k}\right)_{1\leq k\leq N}} \end{align*}

Related Question