When dealing with functions of this sort, I find it easier to deal with the derivative applied to some perturbation.
Let $F(B) = ABC$. Since $F$ is linear, we have $F(B+\Delta) = F(B) + A\Delta C$, so it follows that $DF(B)(\Delta) = A\Delta C$. However, since $DF(B)$ is a map $\mathbb{R}^{n \times n} \to \mathbb{R}^{n \times n}$ (or $\mathbb{C}$, as the case may be), there is no particularly convenient matrix representation.
Note: Each coordinate of $DF(B)$ has a convenient representation. To see, let $\phi_{ij}(B) = e_i^T F(B) e_j = e_i^T ABC e_j$. Then, as above, we have $D\phi_{ij}(B)(\Delta) = e_i^T A\Delta C e_j$.
This gives:
\begin{eqnarray}
D\phi_{ij}(B)(\Delta) &=& e_i^T A \Delta C e_j \\
&=& \operatorname{tr} (e_i^T A \Delta C e_j) \\
&=& \operatorname{tr} (e_j e_i^T A \Delta C ) \\
&=& \operatorname{tr} (C e_j e_i^T A \Delta ) \\
&=& \operatorname{tr} ((A^T e_i e_j^TC^T)^T \Delta ) \\
&=& \langle A^T e_i e_j^TC^T, \Delta \rangle_F
\end{eqnarray}
Where $\langle X, Y \rangle_F = \operatorname{tr}(X^TY)$ is the inner product induced by the Frobenius norm. With this inner product, the gradient is the given by $\nabla \phi_{ij}(B) = A^T e_i e_j^TC^T $.
Another note:
If $G(B) = \operatorname{tr} ((ABC)(ABC)^T) $, then you can write $G(B) = \langle ABC, ABC \rangle_F$. Then $G(B+\Delta ) = G(B) + \langle A \Delta C, ABC \rangle_F + \langle ABC, A \Delta C \rangle_F + \langle A \Delta C, A \Delta C \rangle_F$, from which you can see that
\begin{eqnarray}
DG(B) (\Delta) &=& \langle A \Delta C, ABC \rangle_F + \langle ABC, A \Delta C \rangle_F \\
&=& 2 \langle ABC, A \Delta C \rangle_F \\
&=& 2 \langle A^TABC C^T, \Delta \rangle_F
\end{eqnarray}
From which it follows that $\nabla G(B) = 2A^TABC C^T$.
Replace the trace with the double-dot (aka Frobenius) product
$$\operatorname{tr}(A^TB)=A:B$$
in your masked function.
Then finding the differential and gradient is simple
$$\eqalign{
L &= (M\circ F):(M\circ F) \cr\cr
dL &= 2\,(M\circ F):(M\circ dF) \cr
&= 2\,(M\circ M\circ F):dF \cr
&= 2\,(M\circ F):dF \cr\cr
\frac{\partial L}{\partial F} &= 2\,M\circ F \cr\cr
}$$
In the 3rd line, I made use of the fact that Frobenius and Hadamard products are mutually commutative, i.e. $$A\circ B:X = A:B\circ X$$
Setting the gradient to zero, yields little useful info. Elements in $X$ corresponding to the zero elements of $M$ are unbounded, while the elements corresponding to the unity elements of $M$ are zero.
$$\eqalign{
\cr
}$$
Best Answer
Recall that if $A,B \in \mathbb{R}^{m \times n}$ then \begin{equation} \langle A, B \rangle = \text{Tr}(A^T B) \end{equation} and \begin{align*} \|A\|_F^2 &= \langle A,A \rangle \\ &= \text{Tr}(A^T A) \\ &= \text{Tr}(A A^T). \end{align*}
Let $f:\mathbb{R}^{m \times n} \to \mathbb{R}$ such that \begin{align*} f(X) &= \frac12 \| X A^T \|_F^2 \\ &= \frac12 \text{Tr}(X A^T A X^T). \end{align*} Let $J$ be the $m \times n$ matrix whose entries are all $0$ except $J_{ij}$ which is equal to $1$. Let $\Delta X = \epsilon J$, where $\epsilon > 0$ is tiny.
Then
\begin{align*} f(X + \Delta X) &= \frac12 \text{Tr}((X + \Delta X)A^T A (X + \Delta X)^T) \\ &= \frac12 \text{Tr}(X A^T A X^T) + \frac12 \text{Tr}(\Delta X A^T A X^T) + \frac12 \text{Tr}(X A^T A \Delta X^T) \\ & \qquad + \frac12 \text{Tr}(\Delta X A^T A \Delta X^T) \\ &\approx \frac12 \text{Tr}(X A^T A X^T) + \frac12 \text{Tr}(\Delta X A^T A X^T) + \frac12 \text{Tr}(X A^T A \Delta X^T) \\ &= \frac12 \text{Tr}(X A^T A X^T) + \text{Tr}(X A^T A \Delta X^T) \\ &= f(X) + \left\langle X A^T A,\Delta X \right\rangle \\ &= f(X) + \epsilon \left \langle X A^T A,J \right\rangle. \end{align*}
Comparing this result with the equation \begin{equation} f(X + \epsilon J) \approx f(X) + \epsilon \frac{\partial f(X)}{\partial X_{ij}} \end{equation} we see that \begin{equation} \frac{\partial f(X)}{\partial X_{ij}} = \left \langle X A^T A,J \right\rangle. \end{equation}