Matrix differentiation involving exponential term

derivativesmatricesmatrix-calculusscalar-fields

Let the scalar field $\Phi : \mathbb{R}^{m \times r} \times \mathbb{R}^{r \times n} \to \mathbb{R}$ be defined by

$$\Phi(X,Y) := \sum_{i=1}^m \sum_{j=1}^n \exp \left( – \left( \frac{XY-A}{\gamma} \right)_{ij}^2 \right) $$

where $A\in \mathbb{R}^{m\times n}$ and $\gamma \in \mathbb{R}$ are given. I would like to compute the gradients $\nabla_X \Phi$ and $\nabla_Y \Phi$.

Could anyone please help me with the above differentiation please? I would appreciate a lot.

Best Answer

Define some auxiliary matrices and calculate their differentials $$\eqalign{ B &= \tfrac{1}{\gamma}(XY-A) &\implies dB=\tfrac{1}{\gamma}(X\,dY+dX\,Y) \cr C &= -B\odot B &\implies dC=-2B\odot dB \cr E &= \exp(C) &\implies dE=E\odot dC \cr F &= -\tfrac{2}{\gamma}B\odot E \cr }$$ where the symbol $(\odot)$ represents the elementwise/Hadamard product, and the exp() function is understood to be applied elementwise.

Write the objective function in terms of these new variables.
Then calculate its differential and gradients. $$\eqalign{ \Phi &= J:E \cr d\Phi &= J:dE \cr &= J:(E\odot dC) \cr &= E:dC \cr &= E:(-2B\odot dB) \cr &= -2(B\odot E):dB \cr &= \gamma F:\tfrac{1}{\gamma}(X\,dY+dX\,Y) \cr &= X^TF:dY + FY^T:dX \cr \frac{\partial\Phi}{\partial Y} &= X^TF, \quad \frac{\partial\Phi}{\partial X} = FY^T \cr }$$ where $J\in{\mathbb R}^{m\times n}$ is a matrix of all ones, and (:) represents the trace/Frobenius product, i.e. $$\eqalign{ A:B = {\rm Tr}(A^TB)}$$ The cyclic property of the trace allows the terms to be rearranged in various ways. $$\eqalign{ A:BC = AC^T:B = B^TA:C }$$ Finally, the Hadamard and Frobenius products commute with themselves and each other $$\eqalign{ A:B &= B:A \cr B\odot C &= C\odot B \cr A:(B\odot C) &= (A\odot B):C \cr }$$