I think it is more appropriate in this case to work exclusively in matrix notation. Let me explain.
You have a function $f : \mathrm{Mat}_{n \times p}(\mathbb R) \times \mathrm{Mat}_{p \times m}(\mathbb R) \to \mathrm{Mat}_{n \times m}(\mathbb R)$ sending a pair of matrices $(X,Y)$ to their product $f(X,Y) \overset{def}=XY$. In terms of differential geometry, if we are given a "point" in $\mathrm{Mat}_{n \times p}(\mathbb R) \times \mathrm{Mat}_{p \times m}(\mathbb R)$ (i.e. two matrices), the tangent space is canonically isomorphic to the space itself (since it is a linear manifold) and tangent vectors are just pairs of matrices. We also have a canonical basis consisting of the $(E_{i,j},0)$ and $(0, E_{k,\ell})$ where the indices $(i,j)$ range over $(1,1),\cdots,(n,p)$ and similarly, the indices $(k,\ell)$ range over $(1,1),\cdots,(p,m)$. Using the standard definition of directional derivative,
$$
\frac{\partial f}{\partial (E_{i,j},0)} = \lim_{\varepsilon \to 0} \frac{(X+\varepsilon E_{i,j})Y - XY}{\varepsilon} = \lim_{\varepsilon \to 0} \frac{\varepsilon E_{i,j}Y}{\varepsilon} = E_{i,j}Y.
$$
(Feel free to skip the differential geometry blabla if you agree with the latter equation.) Similarly, you can deduce that $\frac{\partial f}{\partial (0,E_{k,l})} = XE_{k,l}$.
In the same way that the Jacobian matrix of a function $g : \mathbb R^n \to \mathbb R^m$ gives you an $m \times n$-matrix, the Jacobian matrix of the function $f$ gives us an $(np^2m) \times nm$ matrix, something quite discouraging. To enlighten us, we use the fact that our function $f$ is quadratic in the coefficients of $X$ and $Y$. Let us use the following formula to compute the "Taylor expansion" of this function at a pair of matrices $(X_0,Y_0)$:
$$
XY - X_0Y_0 = (X-X_0 + X_0)(Y-Y_0 + Y_0) - X_0 Y_0 \\
= \underset{\text{Jacobian (linear) term}}{\underbrace{(X-X_0)Y_0 + X_0 (Y-Y_0)}} + \underset{\text{Hessian (quadratic) term}}{\underbrace{\frac 12 \left( \phantom{\int}\hspace{-9 pt}2(X-X_0)(Y-Y_0) \right)}}
$$
This suggests that $J_f(X_0,Y_0)(X,Y) = XY_0 + X_0Y$ and $H_f(X_0,Y_0)((X,Y),(X,Y)) = 2XY$ (note that we need two pairs of matrices as arguments since the Hessian is a quadratic form, and we kept the arguments equal for the moment!). This is simply an application of the standard formula for vectors $x_0,x \in \mathbb R^n$ where $g : \mathbb R^n \to \mathbb R^m$
$$
g(x) = g(x_0) + J_g(x_0)(x-x_0) + \frac 12 H_g(x_0)(x-x_0,x-x_0)
$$
The tensor here $H_g(x_0)$ is of order $3$ ; think of each coordinate of $\mathbb R^m$ has having its own Hessian matrix, and $H_g$ is those $m$ matrices patched together. If for some reason you are interested in the Hessian, note that for vectors $x_0,x, x' \in \mathbb R^n$ where $g : \mathbb R^n \to \mathbb R^m$
$$
\frac 12 H_g(x_0)(x,x') = \sum_{i,j=1}^n x_i x_j' \frac{\partial^2 g}{\partial x_i \partial x_j}(x_0)
$$
so if we repeat the same idea but for our function, we get the formula
$$
\frac 12 H_f(X_0,Y_0)((X,Y),(X',Y')) = \sum_{(i,j),(k',\ell')} x_{ij} y'_{k',\ell'} E_{ij} E_{k',\ell'} = \left( \sum_{(i,j)} x_{ij} E_{ij} \right) \left( \sum_{(k',\ell')} y'_{k',\ell'} E_{k',\ell'} \right) = XY'.
$$
The reason why those are the only terms appearing in the sum is because for the other ones, the partial derivatives of the second order of $f$ vanish. Multiplying the above by $2$ generalizes our formula obtained via Taylor expansion (because we only had dealt with the case where $(X,Y) = (X',Y')$). In particular, the Hessian is a constant tensor of total order $3$ (i.e. does not depend on $X_0$ or $Y_0$), which is a characteristic property of quadratic functions.
To understand what "of total order $3$" means, consider this idea : if you have a vector, taking inner products with a vector of same length (a tensor of order $1$) gives you a number. If you have a matrix, taking the products with two vectors, one for each dimension of the matrix, you get back a scalar. In the case of our above Hessian, taking two vectors $(X,Y)$ and $(X',Y')$ and taking the appropriate products gives $XY'$, another vector (in the vector space $\mathrm{Mat}_{n \times m}(\mathbb R)$). See this for more on tensors.
But now that this trick is dealt with, back to the original question : what is $\frac{\partial XY}{\partial X}(X_0,Y_0)$? Come back for a moment to the definition of directional derivative. What we do in this context is consider a function $g(x,y)$ of two variables as a function of a single variable $x$ to evaluate the partial derivative at $(x_0,y_0)$. We can do the same here using the concept of Fréchet derivative instead! This also means that $\frac{\partial f}{\partial X}$ is not a number or a matrix, but a linear operator on the space of matrices corresponding to $X$. For instance, $f(X,Y) = XY$ satisfies $\frac{\partial f}{\partial X}(X_0,Y_0)(X) = XY_0$ since
$$
\lim_{Z \to 0} \frac{\|(X_0+Z)Y_0 - X_0Y_0 - \frac{\partial f}{\partial X}(X_0,Y_0)(Z)\|}{\|Z\|} = \lim_{Z \to 0} \frac{\|(X_0+Z)Y_0 - X_0Y_0 - ZY_0\|}{\|Z\|} = 0.
$$
(The latter is true under any choice of matrix norms.) Similarly, $\frac{\partial f}{\partial Y}(X_0,Y_0)(Y) = X_0Y$.
You can still apply the chain rule with this partial derivative, but you need to worry~; when you had a composition of functions, you multiplied the Jacobian matrices before. In this case, you need to compose the linear operators, so this might mean something a bit different in the context. For instance, if $X,Y$ are both functions of a real variable $t$, then
$$
\frac{\partial X(t)Y(t)}{\partial t}(t_0) = \frac{\partial XY}{\partial X}(X(t_0),Y(t_0)) \left( \frac{\partial X(t)}{\partial t}(t_0) \right) + \frac{\partial XY}{\partial Y}(X(t_0),Y(t_0)) \left( \frac{\partial Y(t)}{\partial t}(t_0) \right) \\
= X'(t) Y(t) + X(t) Y'(t).
$$
(Note that this is what you would expect!) As an exercise, using this answer, you could prove that if $X_0,Y_0$ are matrices such that $X_0Y_0$ is a well-defined square invertible matrix, then
$$
\frac{\partial \det(XY)}{\partial X}(X_0,Y_0)(X) = \mathrm{tr}(\mathrm{adj}(X_0Y_0)(XY_0)) = \det(X_0Y_0) \, \mathrm{tr}( (XY_0)(X_0Y_0)^{-1}).
$$
Hope that helps,
Best Answer
Since $$Xy = \mathrm{vec}(Xy) = \mathrm{vec}(IXy) = (y\otimes I)'\mathrm{vec}(X)$$ take the derivative wrt $\mathrm{vec}(X)$ to obtain $y\otimes I$. This is consistent with the comment of Ben Grossmann as it is the "vectorization" of said fourth order tensor.