[Math] the derivative of kernel function

derivativesgraph theorymatricesnormed-spacesreal-analysis

The heat kernel function is as follow:

$$f(x_i,x_j)= \exp\left(-\frac{||x_i-x_j||^2_2}{\sigma}\right)$$

where $x_i$ and $x_j$ are two column vectors of matrix $X$. $\sigma$ is a nonzero constant. What is the derivative of kernel function? What about the second partial derivative?

###################################################

In addition, we know that $L=D-S$, where $L$ is a Laplacian matrix. $D$ is a degree matrix. $S$ is an affinity matrix. Here, $S$ can be composed by the above $f(x_i,x_j)$.

Therefore, what is the derivative of $Tr(QLQ^T)$ with respect to $X$?

where $Q \in \mathbb{R}^{d\times n}$ is a constant matrix, $L \in \mathbb{R}^{n\times n}$ is a variable that is related to $X$.

Nobody?

Best Answer

This answer finds a nice way to write the distance matrix in terms of the Gram matrix, i.e. $$\eqalign{ G &= X^TX,\quad &g={\rm diag}(G) \\ A_{ij} &= \|x_i-x_j\|^2 &\implies A = g{\tt 1}^T + {\tt 1}g^T-2G \\ }$$ Define analogous quantities based on $Q$ instead of $X$ $$\eqalign{ H &= Q^TQ,\quad &h={\rm diag}(H) \\ B_{ij} &= \|q_i-q_j\|^2 &\implies B = h{\tt 1}^T + {\tt 1}h^T-2H \\ }$$ plus a few more matrices for later convenience $$\eqalign{ R &= -\frac{1}{2\sigma}S\odot B \\ M &= \Big({\rm Diag}(R{\tt 1}) - R\Big) \;=\; {\rm Laplacian}(R)\\ }$$ This problem defines two additional matrices and a scalar function.
(NB: The exp() function is applied element-wise and $\odot$ is the Hadamard product) $$\eqalign{ S &= \exp\left(\frac{-A}{\sigma}\right) \quad\implies dS = -\frac{1}{\sigma} S\odot dA \\ L &= {\rm Diag}(S{\tt 1}) - S \\ \phi &= {\rm Tr}(Q^TQL) \\ &= Q^TQ:\big({\rm Diag}(S{\tt 1}) - S\big) \\ &= \tfrac{1}{2}B:S \\ }$$ Calculate the differential and gradient of the scalar function. $$\eqalign{ d\phi &= \tfrac{1}{2}B:dS \\ &= -\frac{1}{2\sigma}S\odot B:dA \\ &= R:(dg\,{\tt 1}^T + {\tt 1}\,dg^T-2\,dG) \\ &= R{\tt 1}:dg + R^T{\tt 1}:dg - 2R:dG \\ &= 2\Big({\rm Diag}(R{\tt 1}) - R\Big):dG \\ &= 2M:(X^TdX+dX^TX) \\ &= 4M:X^TdX \\ &= 4XM:dX \\ \frac{\partial\phi}{\partial X} &= 4XM \\ }$$

In the above, the diag() function creates a vector from the diagonal of a matrix, while the Diag() function creates a diagonal matrix from a vector.

And the colon is a convenient product notation for the trace, i.e. $\;X:Y={\rm Tr}(X^TY)$.

The matrices $(A,B,G,H,L,M,R,S)$ are all symmetric.

Related Question