For the first question alone (without context) I'm going to prove something else first (then check the $\boxed{\textbf{EDIT}}$ for what is asked):
Suppose we have three matrices $A,X,B$ that are $n\times p$, $p\times r$, and $r\times m$ respectively. Any element $w_{ij}$ of their product $W=AXB$ is expressed by:
$$w_{ij}=\sum_{h=1}^r\sum_{t=1}^pa_{it}x_{th}b_{hj}$$
Then we can show that: $$s=\frac {\partial w_{ij}}{\partial x_{dc}}=a_{id}b_{cj}$$
(because all terms, expect the one multiplied by $x_{dc}$, vanish)
One might deduce (in an almost straightforward way) that the matrix $S$ is the Kronecker product of $B^T$ and $A$ so that:$$\frac {\partial AXB}{\partial X}=B^T⊗A$$
Replacing either $A$ or $B$ with the appropriate identity matrix, gives you the derivative you want.
$$\boxed{\textbf{EDIT}}$$
Upon reading the article you added (and after some sleep!), I've noticed that $dD$ is not $\partial D$ in their notation, but rather $\dfrac {\partial f}{\partial D}$ where $f$ is a certain function of $W$ and $X$ while $D=WX$. This means that the first expression you're having problems with is $$\frac{\partial f}{\partial W}=\frac{\partial f}{\partial D}X^T$$
Since the author at the beginning stated that he'd use the incorrect expression "gradient on" something to mean "partial derivative" with respect to that same thing.
So any element of $\partial f/\partial W$ can be written as $\partial f/\partial W_{ij}$. And any element of $D$:
$$D_{ij}=\sum_{k=1}^qW_{ik}X_{kj}$$
We can write $$df=\sum_i\sum_j \frac{\partial f}{\partial D_{ij}}dD_{ij}$$
$$\frac{\partial f}{\partial W_{dc}}=\sum_{i,j} \frac{\partial f}{\partial D_{ij}}\frac{\partial D_{ij}}{\partial W_{dc}}=\sum_j \frac{\partial f}{\partial D_{dj}}\frac{\partial D_{dj}}{\partial W_{dc}}$$
This last equality is true since all terms with $i\neq d$ drop off.
Due to the product $D=WX$, we have $$\frac{\partial D_{dj}}{\partial W_{dc}}=X_{cj}$$ and so $$\frac{\partial f}{\partial W_{dc}}=\sum_j \frac{\partial f}{\partial D_{dj}}X_{cj}$$
$$\frac{\partial f}{\partial W_{dc}}=\sum_j \frac{\partial f}{\partial D_{dj}}X_{jc}^T$$
This means that the matrix $\partial f/\partial W$ is the product of $\partial f/\partial D$ and $X^T$. I believe this is what you're trying to grasp, and what's asked of you in the last paragraph of the screenshot. Also, as the next paragraph after the screenshot hints, you could've started out with small matrices to work this out before noticing the pattern, and generalizing as I attempted to do directly in the above proof. The same reasoning proves the second expression as well...
For ease of typing let's use the notations
$$\eqalign{
X &= \Sigma \cr
A:X &= {\rm \,tr\,}(A^TX) \,\,\,\,\,\,\text{\{trace/Frobenius product\}} \cr
}$$
Now we can write the original scalar function and find its differential and gradient
$$\eqalign{
\phi &= A:X^{-1} \cr
d\phi &= A:dX^{-1} = -A:X^{-1}\,dX\,X^{-1} = -X^{-1}AX^{-1}:dX \cr
G=\frac{\partial\phi}{\partial X} &= -X^{-1}AX^{-1} \cr
}$$
To proceed to the Hessian, let's introduce the 4th order tensor ${\mathcal H}$ with components
$$\eqalign{
{\mathcal H}_{ijkl} = \delta_{ik}\,\delta_{jl} \cr
}$$
Now we can calculate the differential and gradient of $G$ as
$$\eqalign{
dG
&= -dX^{-1}\,AX^{-1} -X^{-1}A\,dX^{-1} \cr
&= X^{-1}\,dX\,X^{-1}AX^{-1} + X^{-1}AX^{-1}\,dX\,X^{-1} \cr
&= -(X^{-1}\,dX\,G + G\,dX\,X^{-1}) \cr
&= -(X^{-1}{\mathcal H}G + G{\mathcal H}X^{-1}):dX \cr
\frac{\partial^2\phi}{\partial X^2} = \frac{\partial G}{\partial X} &= -(X^{-1}{\mathcal H}G + G{\mathcal H}X^{-1}) \cr\cr
}$$
If you are not comfortable with higher-order tensors, you can use vectorization instead
$$\eqalign{
{\rm vec}(dG) &= -{\rm vec}(X^{-1}\,dX\,G + G\,dX\,X^{-1}) \cr
dg &= -(G\otimes X^{-1} + X^{-1}\otimes G)\,dx \cr
\frac{\partial g}{\partial x} &= -(G\otimes X^{-1} + X^{-1}\otimes G) \cr\cr
}$$
NB: In some of these steps, I made use of the fact that $(X,A,G)$ are symmetric matrices.
Best Answer
In index notation, the function can be written as $$F_{ik} = W_{ij} X_{jk}$$ The indices $\{i,k\}$ are not repeated and are called "free" indices,
but $\{j\}$ is a repeated "dummy" index and is implicitly summed over.
Now calculate the derivative with respect to the component $W_{qr}$ $$\eqalign{ \frac{\partial F_{ik}}{\partial W_{qr}} &= \frac{\partial W_{ij}}{\partial W_{qr}}\;X_{jk} \\ &= \delta_{iq}\delta_{rj}\;X_{jk} \\ &= \delta_{iq}\;X_{rk} \\ }$$ The symbol $\delta_{iq}$ is called a Kronecker delta. When $i=q$ it equals ${\tt 1}$ otherwise it's equal to $0$.
Since the derivative has 4 free indices, it is a 4th order tensor, whose dimensions are $(m\times p\times m\times n)$
Since higher order tensors are awkward to work with, most texts flatten the matrices $(F,W)$ into the vectors $(f,w)$ and then calculate the derivative using ordinary matrix notation. $$\eqalign{ {\rm vec}(F) &= {\rm vec}(IWX) = (X^T\otimes I)\,{\rm vec}(W) \\ f &= (X^T\otimes I)\,w \\ df &= (X^T\otimes I)\,dw \\ \frac{\partial f}{\partial w} &= (X^T\otimes I) \\ }$$ This result is a matrix, not a tensor; the symbol $\otimes$ represents the Kronecker product.