The easiest way to calculate the derivatives of matrix valued functions is to go back to the definition of derivative in the usual sense of limits, so if
$$f(A,B,C) = (A + BC)^{T}(A+BC),$$
then for some small $t > 0$, and some $E \in M_{3\times 1}(\mathbb{R})$, we calculate the directional derivative in the "direction" of $E$:
$$\frac{\partial f(A,B,C)}{\partial C}(E) = \lim_{t\rightarrow 0}\frac{f(A,B,C+tE) - f(A,B,C)}{t}\\
= \lim_{t\rightarrow 0}\frac{1}{t}\bigg[ \big(A + B(C+tE)\big)^{T}\big(A+B(C+tE)\big) -(A + BC)^{T}(A+BC)\bigg]\\
\lim_{t\rightarrow 0}\frac{1}{t}\bigg[ (A + BC)^{T}(A+BC) + t(BE)^{T}(A+BC) + t(A+BC)^{T}(BE) + t^{2}(BE)^{T}(BE)\\
- (A+BC)^{T}(A+BC)\bigg]\\
=\lim_{t\rightarrow 0}\bigg[(BE)^{T}(A+BC) + (A+BC)^{T}(BE) + t(BE)^{T}(BE)\bigg]\\
\\
=(BE)^{T}(A+BC) + (A+BC)^{T}(BE),
$$
i.e.
$$\frac{\partial f(A,B,C)}{\partial C}(E) = (BE)^{T}(A+BC) + (A+BC)^{T}(BE)
$$
which is similar to your answer just now the (3,1) matrix $E \in M_{3 \times 1}(\mathbb{R})$ sorts out the dimension problem you were having.
$
\def\b{\bullet}
\def\e{\varepsilon}
\def\m#1{\left[\begin{array}{c}#1\end{array}\right]}
\def\p#1#2{\frac{\partial #1}{\partial #2}}
$I
really like The Matrix Cookbook but the section on structured matrices is not very good, so here's a different approach to the subject.
Given a vector of parameters $\{p\}$ and matrix basis $\{B_i\}$
$$\eqalign{
p &= \m{\alpha \\ \beta},\qquad
B_1 = \m{1 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0},\qquad
B_2 = \m{0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 1 & 1 & 0}
\\
}$$
create a structured matrix $\{A\}$ and cost function $\{\phi\}$
$$\eqalign{
A &= \sum_{i=1}^2\;p_iB_i
\;=\; \m{\alpha & \alpha & 0 & 0 \\ 0 & \beta & 0 & 0 \\ 0 & \beta & \beta & 0},\qquad
&\phi = \tfrac 12\Big\|AX-Y\Big\|_F^2 \\
}$$
Note that $(\alpha,\beta)$ are the only independent variables in the entire problem.
When $A$ is unconstrained it's easy to calculate the gradient/differential of the cost
$$\eqalign{
G = \p{\phi}{A} = (AX-Y)X^T \quad\implies\quad d\phi = G\b dA \\
}$$
where the bullet denotes the matrix inner product, i.e.
$$\eqalign{
G\b dA
&= \sum_{i=1}^3\sum_{j=1}^4 G_{ij}\;dA_{ij} \;=\; {\rm Tr}(G^TdA) \\
}$$
Because of the structure which was imposed on $A$,
its differential is also structured
$$dA = \sum_{i=1}^2 B_i\,dp_i$$
Substituting this expression leads to the parametric gradient
$$\eqalign{
d\phi &= \sum_{i=1}^2\;G\b(B_i\,dp_i)
= \sum_{i=1}^2\left(\p{\phi}{p_i}\right)dp_i \quad\implies\quad
\p{\phi}{p_i} = G\b B_i \\
}$$
At this point, one would do all further calculations
in terms of the $p$-vector.
Now comes the weird part...
Every basis $\{B_i\}$ has a dual basis $\{B_i^\delta\}$ which spans the same subspace $\cal S$, but is orthonormal with respect to the inner product
$$B_i\b B_j^\delta \;=\; \delta_{ij}$$
Some bases are self-dual, such as the canonical vector basis
$\{\e_i\}$, but in general determining the dual basis requires a pseudoinverse calculation
$$\eqalign{
&\;b_k = {\rm vec}(B_k) \qquad &\;b_k^\delta = {\rm vec}(B_k^\delta) \\
&\m{b_1 & b_2 &\ldots & b_p}^+ = &\m{b_1^\delta & b_2^\delta &\ldots & b_p^\delta}^T \\
}$$
In the vector case, the gradient with respect to the $p$-vector can be written as the sum of each component multiplied by the corresponding vector from the dual basis, i.e.
$$\eqalign{
\p{\phi}{p}
&= \sum_{i=1}^2 \left(\p{\phi}{p_i}\right)\e_i \\
}$$
Many authors extend this idea and define the structured gradient
as the matrix
$$\eqalign{
\left(\p{\phi}{A}\right)_S
&= \sum_{i=1}^2\left( \p{\phi}{p_i} \right) B_i^\delta \\
&= \sum_{i=1}^2\left(G\b B_i\right) B_i^\delta \\
&= G\b\left(\sum_{i=1}^2 B_i B_i^\delta \right) \\
&= G\b{\cal B} \\
}$$
where $\cal B$ is a fourth-order tensor with components
$${\cal B}_{jk\ell m}
= \sum_{i=1}^2\;\left(B_i\right)_{jk}\,\left(B_i^\delta\right)_{\ell m} \\$$
The $\cal B$ tensor is a projector into the subspace
$\big(\,{\cal B}\b X\in{\cal S}\;\;{\rm for}\;X\in{\mathbb R}^{3\times 4}\big)$
where it also acts as an identity tensor for the subspace
$\big({\cal B}\b M=M\b{\cal B} = M\;\;{\rm for}\;M\in{\cal S}\big)$ .
If the basis spans the whole space $\,{\cal S}\equiv{\mathbb R}^{3\times 4}\,$ then $\cal B$ becomes the true identity tensor $\cal I$, and the structured gradient is identical to the full unstructured gradient $G$
(as expected).
$$\eqalign{
{\cal B}_{jk\ell m} \;&\to\; {\cal I}_{jk\ell m}
= \delta_{j\ell}\delta_{km} \\
(G\b{\cal B}) \;&\to\; (G\b{\cal I}) = G \\
}$$
As a concrete example, let's examine a symmetrically constrained $2\times 2$ matrix.
$$\eqalign{
p &= \m{\alpha \\ \beta \\ \lambda},\qquad
B_1 = \m{1 & 0 \\ 0 & 0},\qquad
B_2 = \m{0 & 0 \\ 0 & 1},\qquad
B_3 = \m{0 & 1 \\ 1 & 0} \\
A &= \m{\alpha & \lambda \\ \lambda & \beta}
\quad=\quad \alpha B_1 + \beta B_2 + \lambda B_3,\qquad
B_k^\delta = \frac{B_k}{B_k\b B_k} \\
}$$
The structured gradient calculation then goes as follows
$$\eqalign{
\left(\p{\phi}{A}\right)_S
&= \frac{(G\b B_1)B_1}{B_1\b B_1}
+ \frac{(G\b B_2)B_2}{B_2\b B_2}
+ \frac{(G\b B_3)B_3}{B_3\b B_3} \\
&= G_{11}\,B_1 +G_{22}\,B_2 +\tfrac 12(G_{12}+G_{21})\,B_3 \\
&= \m{G_{11} & \tfrac 12(G_{12}+G_{21}) \\ \tfrac 12(G_{12}+G_{21}) & G_{22}} \\
&= \left(\frac{G+G^T}{2}\right) \;\doteq\; {\rm Sym}(G) \\
}$$
But The Matrix Cookbook uses the regular basis instead of the dual basis
which results in the following miscalculation
$$\eqalign{
\left(\p{\phi}{A}\right)_{S^*}
&= \left(G\b B_1\right)B_1
+ \left(G\b B_2\right)B_2
+ \left(G\b B_3\right)B_3 \\
&= G_{11}\,B_1 +G_{22}\,B_2 +(G_{12}+G_{21})\,B_3 \\
&= \m{G_{11} & (G_{12}+G_{21}) \\ (G_{12}+G_{21}) & G_{22}} \\
&= G+G^T-{\rm Diag}(G) \\
}$$
The skew-symmetric case is similar but is seldom mentioned.
There is only one parameter and one matrix in the basis
$$\eqalign{
p &= \m{\alpha},\qquad B = \m{0 & 1 \\ -1 & 0},\qquad
B^\delta = \frac{B}{B\b B} \\
A &= \m{0 & \alpha \\ -\alpha & 0} \;\;=\;\; \alpha B \\
}$$
and the structured gradient is
$$\eqalign{
\left(\p{\phi}{A}\right)_S
&= \frac{(G\b B)B}{B\b B} \\
&= \tfrac 12(G_{12}-G_{21})\,B \\
&= \m{0 & \tfrac 12(G_{12}-G_{21}) \\ \tfrac 12(G_{21}-G_{12}) & 0} \\
&= \left(\frac{G-G^T}{2}\right) \;\doteq\; {\rm Skew}(G) \\
}$$
If you use $B$ instead of $B^\delta$ in this case, the gradient has the right direction but the wrong length, i.e.
$$\eqalign{
\left(\p{\phi}{A}\right)_{S^*}
&= \left(G-G^T\right) \;=\; 2\;{\rm Skew}(G) \\
}$$
Best Answer
$ \def\LR#1{\left(#1\right)} \def\BR#1{\Big(#1\Big)} \def\bR#1{\big(#1\big)} \def\op#1{\operatorname{#1}} \def\Diag#1{\op{Diag}\LR{#1}} \def\trace#1{\op{Tr}\LR{#1}} \def\qiq{\quad\implies\quad} \def\b{\beta} \def\o{{\tt1}} \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} $Since $X$ is a real symmetric matrix, it can be diagonalized as follows $$\eqalign{ X &= QBQ^T, \qquad Q^TQ=I,\quad B=\Diag{\b_k},\\ }$$ Given a real differentiable function (and its derivative) $$f(z), \qquad f'(z) = \frac{df}{dz}$$ when this function is applied to a matrix argument, the $\sf Daleckii\:Krein\,$ theorem tells us $$\eqalign{ F &= f(X) \\ dF &= Q\BR{R\odot\LR{Q^TdX\:Q}}Q^T \\ R_{jk} &= \begin{cases} {\large\frac{f(\b_j)-f(\b_k)}{\b_j-\b_k}} \quad\quad {\rm if}\;\b_j\ne\b_k \\ \\ \quad f'(\b_k) \qquad\quad {\rm otherwise} \\ \end{cases} \\ }$$ where ${\odot}$ denotes the elementwise/Hadamard product.
In this particular problem, we conveniently have $${f'(z) = f(z) = \exp(z)}$$ Let's also introduce the Frobenius product, which is a handy notation for the trace $$\eqalign{ A:B &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; \trace{A^TB} \\ A:A &= \|A\|^2_F \qquad \{ {\rm Frobenius\;norm} \} \\ }$$ This is also called the double-dot or double contraction product.
When applied to vectors $(n=\o)$ it reduces to the standard dot product.
The properties of the underlying trace function allow the terms in a Frobenius product to be rearranged in many useful ways, e.g. $$\eqalign{ A:B &= B:A \\ A:B &= A^T:B^T \\ C:\LR{AB} &= \LR{CB^T}:A &= \LR{A^TC}:B \\ }$$
The Frobenius and Hadamard products commute in the following sense $$\eqalign{ A:\LR{B\odot C} = \LR{A\odot B}:C \;=\; \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij}C_{ij} \\\\ }$$
Use the above notation to rewrite the objective function and calculate its gradient. $$\eqalign{ \phi &= A:F \\ d\phi &= A:dF \\ &= A : Q\BR{R\odot\LR{Q^TdX\:Q}}Q^T \\ &= Q\BR{R\odot\LR{Q^TAQ}}Q^T : dX \\ \grad{\phi}{X} &= Q\BR{R\odot\LR{Q^TAQ}}Q^T \\ }$$