Taking matrix derivative $\| \left| \mathbf{X}\mathbf{W}\right|-\mathbf{1}_{n \times K} \| ^2_F$ with respect to W

derivativeslinear algebramatricesmatrix-calculusmultivariable-calculus

I am trying to take the matrix derivative of the following function with respect to $\bf W$:

\begin{equation}
\| \left| \mathbf{X}\mathbf{W}\right|-\mathbf{1}_{n \times K} \| ^2_F \\
\end{equation}

Where $\mathbf{X}$ is $n \times d$, $\mathbf{W}$ is $d \times K$ and $\mathbf{1}_{n \times K}$ is a marix with all elements one. $\| \cdot \|_F$ is the Frobenius norm and $\left| \mathbf{X}\mathbf{W}\right|$ is the element wise absolute value of $\mathbf{X}\mathbf{W}$.

Any helps is highly appreciated.

Best Answer

For typing convenience, define the matrices $$\eqalign{ Y &= XW \\ J &= 1_{n\times K} \qquad&({\rm all\,ones\,matrix}) \\ S &= {\rm sign}(Y) \\ A &= S\odot Y \qquad&({\rm absolute\,value\,of\,}Y) \\ B &= A-J \\ Y &= S\odot A \qquad&({\rm sign\,property}) \\ }$$ where $\odot$ denotes the elementwise/Hadamard product and the sign function is applied element-wise. Use these new variables to rewrite the function, then calculate its gradient. $$\eqalign{ \phi &= \|B\|_F^2 \\&= B:B \\ d\phi &= 2B:dB \\ &= 2(A-J):dA \\ &= 2(A-J):S\odot dY \\ &= 2S\odot(A-J):dY \\ &= 2(Y-S):dY \\ &= 2(Y-S):X\,dW \\ &= 2X^T(Y-S):dW \\ \frac{\partial\phi}{\partial W} &= 2X^T(Y-S) \\ }$$ where a colon denotes the trace/Frobenius product, i.e. $$\eqalign{ A:B = {\rm Tr}(A^TB) = {\rm Tr}(AB^T) = B:A }$$ The cyclic property of the trace allows such products to be rearranged in various ways $$\eqalign{ A:BC &= B^TA:C \\ &= AC^T:B \\ }$$ Finally, when $(A,B,C)$ are all the same size, their Hadamard and Frobenius products commute with each other $$\eqalign{ A:B\odot C &= A\odot B:C \\\\ }$$ NB: When an element of $\,Y$ equals zero, the gradient is undefined. This behavior is similar to the derivative of $\,|x|\,$ in the scalar case.

Related Question