Derivative of trace involving the Hadamard product

linear algebramatricesmatrix-calculusoptimization

I am trying to find a matrix product of a co-occurrence matrix

$$\mathbf{Y} \approx \mathbf{U} \mathbf{V}^\intercal,$$ where $\mathbf{Y} \in \mathbb{R}^{I \times J}$, $\mathbf{U} \in \mathbb{R}^{I \times K}$ and $\mathbf{X} \in \mathbb{V}^{J \times K}$.

To find this, I am going to minimise the error (note I have added regularisation terms to prevent overfitting)

$$\xi(\mathbf{U},\mathbf{V}) = ||\mathbf{Y} – \mathbf{U} \mathbf{V}^\intercal ||^2_F + \lambda_u || \mathbf{U} \mathbf{U}^\intercal ||^2_F + \lambda_v || \mathbf{V} \mathbf{V}^\intercal ||^2_F. $$

Then to minimise this I will find the derivatives with respect to both $\mathbf{U}$ and $ \mathbf{V} $:

$$\nabla_\mathbf{U} = -2\mathbf{V} \left( \mathbf{Y} – \mathbf{U}\mathbf{V}^{ \intercal} \right) + 2\lambda_u \mathbf{U}$$
$$\nabla_\mathbf{V} = -2\mathbf{U}\left( \mathbf{Y} – \mathbf{U}\mathbf{V}^\intercal \right) + 2\lambda_v \mathbf{V}.$$

These have been calculated using the identities:

$$ ||\mathbf{A}||^2_F = \text{tr}(\mathbf{A}\mathbf{A}^\intercal) $$ and $$ \frac{\partial}{\partial \mathbf{Y}} \text{tr}((\mathbf{A}\mathbf{Y} + \mathbf{C})(\mathbf{A}\mathbf{Y} + \mathbf{C})^\intercal) = \mathbf{A^\intercal(\mathbf{A}\mathbf{Y} + \mathbf{C})}.$$

I have no issue with these derivations, the problem is I am now trying to solve the same problem but with an added $\textit{masking matrix}$ (for reasons I won't go into here), which is a randomly populated binary matrix $\mathbf{M} \in \mathbb{R}^{I \times J}$. This changes the error function to

$$\xi(\mathbf{U},\mathbf{V}) = ||\mathbf{M}\odot(\mathbf{Y} – \mathbf{U} \mathbf{V}^\intercal) ||^2_F + \lambda_u || \mathbf{U} \mathbf{U}^\intercal ||^2_F + \lambda_v || \mathbf{V} \mathbf{V}^\intercal ||^2_F. $$

I'm unsure how I would find the derivative of this new error function as the Hadamard product complicates things – if anyone could shed some light on this it would be much appreciated!

Thank you.

Best Answer

$\def\p{\partial}$ The binary matrix $M$ has the property that $\;M\odot M = M$

Let a colon denote the trace/Frobenius product, i.e. $$B:C={\rm Tr}(B^TC)={\rm Tr}(C^TB)=C:B$$ The Hadamard and Frobenius products commute in the following senses $$\eqalign{ A:B &= B:A \\ A\odot B &= B\odot A \\ C:(A\odot B) &= (C\odot A):B \\ }$$ For typing convenience define the matrices $$\eqalign{ X &\doteq UV^T-Y \\ W &\doteq M\odot X \\ }$$ Notice that $$\eqalign{ M\odot W &= M\odot M\odot X \\&= M\odot X \\&= W }$$ Write the error function in terms of these and calculate its differential. $$\eqalign{ {\cal E} &= W:W + \lambda_uU:U + \lambda_vV:V \\ \\ d{\cal E} &= 2W:dW + 2\lambda_uU:dU + 2\lambda_vV:dV \\ &= 2W:(M\odot dX) + 2\lambda_uU:dU + 2\lambda_vV:dV \\ &= 2W:(dU\,V^T+U\,dV^T) + 2\lambda_uU:dU + 2\lambda_vV:dV \\ &= 2WV:dU + 2U^TW:dV^T + 2\lambda_uU:dU + 2\lambda_vV:dV \\ &= 2(WV+\lambda_uU):dU \;+\; 2(W^TU+\lambda_vV):dV \\ }$$ From which the gradients can be identified as $$\eqalign{ \frac{\p{\cal E}}{\p U} &= 2(M\odot X)V + 2\lambda_uU \\ \frac{\p{\cal E}}{\p V} &= 2(M\odot X)^TU + 2\lambda_vV \\ }$$ Note that setting $M=J\,$ (i.e. the all-ones matrix) recovers the gradients in the original (non-Hadamard) case, since for any matrix $A$ one has $\;J\odot A = A$.

Related Question