[Math] Matrix Differentiation using Kronecker operator issue

calculusderivativeskronecker productmatrices

Let X an $n\times n$ variable matrix and given vectors and matrices $p_1$ ($1\times n$), $p_2$ ($n\times 1$), $\Omega$ ($n\times n$). What is the derivative of the function $f(X)=p_{1}X^{-1}\Omega Xp_2$ ?

I used the following three matrix differentiation rules found here:

$\frac{d(AXB)}{dX}=B^T\otimes A$, $\frac{dX^{-1}}{dX}=-(X^{-T}\otimes X^{-1})$ and the multiplication rule $D[f(x)^Tg(x)]=g(x)^Tf'(x)+f(x)^Tg'(x)$.

I broke $f$ down to a product of two functions: $g_1(X)=p_1X^{-1}\Omega$ and $g_2(X)=Xp_2$. Then I found the derivatives of each to be $\frac{dg_1{X}}{dX}=-(\Omega^T\otimes p_1)(X^{-T}\otimes X^{-1})$ and $\frac{dg_2(X)}{dX}=p_2^T\otimes I_n$, where $\otimes$, the Kronecker multiplication. However when I use the multiplicative rule the second component $$\Omega^TX^{-T}p_1^T(p_2^T\otimes I_n)$$ seems to have dimension that don't work out, unless we DO NOT take the transpose of $g_1$. I know the result should be a $1\times n^2$ matrix. What am I missing?

Best Answer

Dimensionally, since $f$ is a scalar and $X$ is an $n\times n$ matrix, the derivative $\frac {\partial f} {\partial X}$ will be an $n\times n$ matrix.

The simplest way to get there is to use differentials. First let $P = p_1^T p_2^T$, then you can write the function as $$f = P:X^{-1}\Omega\,X$$ where $A:B$ denotes the Frobenius (inner) product between matrices $A,B$.

Now take the differential and rearrange it until you isolate $dX$ on the RHS. $$ \eqalign { df &= P:d\,(X^{-1}\Omega\,X) \cr &= P:(X^{-1}\Omega\,dX) + P:(dX^{-1}\Omega\,X) \cr &= (\Omega^TX^{-T}P):dX + (PX^T\Omega^T):dX^{-1} \cr &= (\Omega^TX^{-T}P):dX + (PX^T\Omega^T):(-X^{-1} dX\,X^{-1}) \cr &= (\Omega^TX^{-T}P):dX - (X^{-T}PX^T\Omega^T X^{-T}):dX \cr &= \big(\Omega^TX^{-T}P - X^{-T}PX^T\Omega^T X^{-T}\big):dX \cr } $$ Passing from the differential to the derivative $$ \eqalign { \frac {\partial f} {\partial X} &= \big(\Omega^TX^{-T}P - X^{-T}PX^T\Omega^T X^{-T}\big) \cr } $$

Update

The problem with your initial approach is that your definition of $g_1$ is already transposed.

So $f \ne g_1^T g_2$ but instead $f = g_1 g_2$.

Now all the dimensions work out as expected.

Related Question