This answer finds a nice way to write the distance matrix in terms of the Gram matrix, i.e.
$$\eqalign{
G &= X^TX,\quad &g={\rm diag}(G) \\
A_{ij} &= \|x_i-x_j\|^2 &\implies A = g{\tt 1}^T + {\tt 1}g^T-2G \\
}$$
Define analogous quantities based on $Q$ instead of $X$
$$\eqalign{
H &= Q^TQ,\quad &h={\rm diag}(H) \\
B_{ij} &= \|q_i-q_j\|^2 &\implies B = h{\tt 1}^T + {\tt 1}h^T-2H \\
}$$
plus a few more matrices for later convenience
$$\eqalign{
R &= -\frac{1}{2\sigma}S\odot B \\
M &= \Big({\rm Diag}(R{\tt 1}) - R\Big) \;=\; {\rm Laplacian}(R)\\
}$$
This problem defines two additional matrices and a scalar function.
(NB: The exp() function is applied element-wise and $\odot$ is the Hadamard product)
$$\eqalign{
S &= \exp\left(\frac{-A}{\sigma}\right) \quad\implies
dS = -\frac{1}{\sigma} S\odot dA \\
L &= {\rm Diag}(S{\tt 1}) - S \\
\phi &= {\rm Tr}(Q^TQL) \\
&= Q^TQ:\big({\rm Diag}(S{\tt 1}) - S\big) \\
&= \tfrac{1}{2}B:S \\
}$$
Calculate the differential and gradient of the scalar function.
$$\eqalign{
d\phi &= \tfrac{1}{2}B:dS \\
&= -\frac{1}{2\sigma}S\odot B:dA \\
&= R:(dg\,{\tt 1}^T + {\tt 1}\,dg^T-2\,dG) \\
&= R{\tt 1}:dg + R^T{\tt 1}:dg - 2R:dG \\
&= 2\Big({\rm Diag}(R{\tt 1}) - R\Big):dG \\
&= 2M:(X^TdX+dX^TX) \\
&= 4M:X^TdX \\
&= 4XM:dX \\
\frac{\partial\phi}{\partial X} &= 4XM \\
}$$
In the above, the diag() function creates a vector from the diagonal of a matrix, while the Diag() function creates a diagonal matrix from a vector.
And the colon is a convenient product notation for the trace, i.e.
$\;X:Y={\rm Tr}(X^TY)$.
The matrices $(A,B,G,H,L,M,R,S)$ are all symmetric.
We will use the following facts:
\begin{align}
\int_{\mathbb{R}} \exp(-\tfrac{1}{2}x^2)\mathrm{d} x &= \sqrt{2\pi}, \\
\int_{\mathbb{R}} x^2 \exp(-\tfrac{1}{2}x^2)\mathrm{d} x &= \sqrt{2\pi}, \\
\int_{\mathbb{R}} x^4\exp(-\tfrac{1}{2}x^2)\mathrm{d} x &= 3\sqrt{2\pi}, \\
\int_{\mathbb{R}} x^{2k+1} \exp(-\tfrac{1}{2}x^2)\mathrm{d} x &= 0
\end{align}
where $k\ge 0$ is an integer.
It is easy to get
$$I_1 = \int_{\mathbb{R}^d} yy^{\mathsf{T}}\exp(-\tfrac{1}{2}y^{\mathsf{T}}y)\mathrm{d} y
= \sqrt{2\pi}^d I_d$$
and
$$I_2 = \int_{\mathbb{R}^d} y(y^{\mathsf{T}}y)\exp(-\tfrac{1}{2}y^{\mathsf{T}}y)\mathrm{d} y = 0.$$
Let us deal with the third part. Denote $Q = \Sigma^{1/2}\nabla^2 f(\mu)\Sigma^{1/2}$.
Let $Q = U\mathrm{diag}(\lambda_1, \lambda_2, \cdots, \lambda_d)U^{\mathsf{T}}$ be
the eigendecomposition of $Q$ where $\lambda_1, \lambda_2, \cdots, \lambda_d$
are the eigenvalues of $Q$, and $U$ is an orthogonal matrix
whose columns are the eigenvectors of $Q$. We have
\begin{align}
I_3 &= \int_{\mathbb{R}^d} yy^{\mathsf{T}} y^{\mathsf{T}}Q y \exp(-\tfrac{1}{2}y^{\mathsf{T}}y)\mathrm{d} y\\
&= \int_{\mathbb{R}^d} yy^{\mathsf{T}} y^{\mathsf{T}}
U\mathrm{diag}(\lambda_1, \lambda_2, \cdots, \lambda_d)U^{\mathsf{T}} y \exp(-\tfrac{1}{2}y^{\mathsf{T}}y)\mathrm{d} y \\
&= U \left(\int_{\mathbb{R}^d} zz^{\mathsf{T}} z^{\mathsf{T}}\mathrm{diag}(\lambda_1, \lambda_2, \cdots, \lambda_d)z \exp(-\tfrac{1}{2}z^{\mathsf{T}}z)\mathrm{d} z\right) U^{\mathsf{T}}\\
&= U \left(\int_{\mathbb{R}^d} zz^{\mathsf{T}} (\sum\nolimits_k \lambda_k z_k^2) \exp(-\tfrac{1}{2}z^{\mathsf{T}}z)\mathrm{d} z\right) U^{\mathsf{T}}\\
&= \sqrt{2\pi}^d U \left(2\mathrm{diag}(\lambda_1, \lambda_2, \cdots, \lambda_d) + (\sum\nolimits_k \lambda_k)I_d \right) U^{\mathsf{T}}\\
&= 2 \sqrt{2\pi}^d Q + \mathrm{Tr}(Q)\sqrt{2\pi}^d I_d
\end{align}
where we have used the substitution $z = U^{\mathsf{T}} y$ (about the change of variables, see [1], and [2] Ch. 2.1).
Reference
[1] J. Schwartz, "The Formula for Change in Variables in a Multiple Integral",
The American Mathematical Monthly, Vol. 61, No. 2 (Feb., 1954), pp. 81-85
[2] Robb J. Muirhead, "Aspects of multivariate statistical theory", 2005.
Best Answer
This is a matter of interchanging of derivative and integration. Let me simplify the expressions a bit, say you want to calculate $$ \frac\partial{\partial p} \mathbb E[ g(p,\cdot)] =\frac\partial{\partial p} \int_\Omega g(p,X) d\mu(X) $$ for some real parameter $p$. Perturb $p$ by some $h$, then $$ \frac1t \left( \int_\Omega g(p+th,X) -g(p) d\mu(X) \right) =\int_\Omega \frac{g(p+th,X) -g(p)}t d\mu(X) =\int_\Omega \int_0^1 \frac{\partial g}{\partial p}(p+sth,X)h d\mu(X). $$ Now one needs to pass to the limit for $t\searrow0$. Dominated convergence will help. Hope this gives an idea, how to tackle this problem.