Partial derivative of a matrix of euclidean distances

calculusderivativesmatrices

I'm trying to compute the gradient of a cost function wrt. to some matrices. During the computations, I reached an expression for which I wasn't sure about its derivative.
Here's my problem :

Let $X = (x_1,x_2,\cdots,x_n) \in \mathbb(R)_+^{M\times N}$, $F = (f_1, f_2,\cdots,f_k)\in \mathbb(R)_+^{M\times K}$ is the matrix of cluster centroids, $G = (g_1, g_2,\cdots,g_k)\in \mathbb(R)_+^{K\times N}$ indicates the matrix clustering indicators, and $D \in \mathbb(R)_+^{K\times N}$ is the matrix of the euclidean distances between each data point $X$ and the set of centroids $F$, more precisely $d_{kn} = \Vert x_n – k \Vert$.

What is the derivative of $Tr(G\circ D.(G\circ D)^T)$ wrt. $F$ ? $st. Tr$ is the trace, and $\circ$ is the Hadamard Product.

The solution I found, using the matrix cookbook formulas, is :
$\frac{\partial}{\partial F} Tr(G\circ D.(G\circ D)^T) = 2(G\cdot F\circ G – G\cdot X\circ G)$

But I'm not sure about it. In case my solution is wrong, can you please provide a step by step explanation?
Thanks

Best Answer

In the answer from Greg, you can simplify things since $\mathbf{G} \circ \mathbf{G}=\mathbf{G}$.

The gradient vanishes when \begin{equation} \mathbf{F} = \frac{\mathbf{X} \mathbf{G}^T}{\mathbf{J}_X \mathbf{G}^T} \end{equation} (elementwise division). which will be the natural way to compute a centroid given the appartenance of each example to a cluster contained in the binary matrix $\mathbf{G}$.