$\def\d{{\rm diag}}\def\D{{\rm Diag}}\def\p#1#2{\frac{\partial #1}{\partial #2}}$Let's
use a colon to denote the trace/Frobenius product
$$\eqalign{
A:B &= {\rm Tr}(A^TB) \;=\; \sum_{j=1}^m\sum_{k=1}^n A_{jk} B_{jk} \\
A:A &= \big\|A\big\|_F^2 \\
}$$
In the case that $(A,B)$ are vectors, this definition corresponds to the standard dot product. The key idea is that the matrix/vector on each side of the colon must have the same dimensions.
The Frobenius product has many interesting properties.
In particular, for dimensionally compatible matrices $(A,B,C)$ and vector $(v)$
$$\eqalign{
AB:C &= A:CB^T \\&= B:A^TC \\&= C:AB \\
A:\D(v) &= \d(A):v \\
}$$
${\bf NB}\!:\,$ The diag operator with an uppercase 'D' creates a diagonal matrix from a vector, while the one with the lowercase 'd' creates a vector from the diagonal of a matrix.
Write the first function in terms of this product.
Then calculate its differential and gradient.
$$\eqalign{
\phi &= (BA):\D(x) \\&= \d(BA):x \\
d\phi &= \d(BA):dx \\
\p{\phi}{x} &= \d(BA) \\
\p{\phi}{X} &= \D\big(\d(BA)\big) = I\odot BA \\
}$$
where $\odot$ denotes the elementwise/Hadamard product and $I$ is the identity matrix.
For the second function, let $\,Y=AX\;$ and use Jacobi's formula
$$\eqalign{
\psi &= \log(\det(Y)) \\
d\psi &= Y^{-T}:dY \\
&= (AX)^{-T}:A\,dX \\
&= A^T(A^{-T}X^{-T}):dX \\
&= X^{-1}:\D(dx) \\
&= \d(X^{-1}):dx \\
\p{\psi}{x} &= \d(X^{-1}) \\
\p{\psi}{X} &= \D\big(\d(X^{-1})\big) = I\odot X^{-1} = X^{-1} \\
}$$
since $X\,\left({\rm and}\,X^{-1}\right)$ is a already a diagonal matrix the Diag operator has no effect.
This is actually a generic issue with partial derivatives with respect to a constrained object, and parameterizations of that object that would allow violating the constraints. This also crops up with e.g. unit vectors, or orthogonal matrices.
A symmetric matrix only has n*(n+1)/2 degrees of freedom -- you cannot vary them separately, and it's not clear what it would mean to hold $g_{i,j}$ fixed while varying $g_{j,i}$ -- the usual explanation for partial derivatives. If you have a function defined along a curve in 3-space, there really isn't a partial derivative with respect to x, especially if some places along the curve x doesn't locally change.
What people generally want in cases like this is a way to use the normal formulas and have them work, at least when the "infinitesimal changes" keep the new object obeying the constraint. For example, the change in a symmetric matrix is a symmetric matrix, and the change in a unit vector must be orthogonal to the unit vector. These treatments usually involve some hideous and non-obvious abuses of notation, and are "really" moving to a parameterized, but constrained total derivative. Doing this for a symmetric matrix has $\mathrm{d}G = S\,\mathrm{d}t$, with $S$ any symmetric matrix.
The derivative of a unit vector with respect to itself is an example of this, though going through method 2 below. The end result of the "partial" derivative with respect to itself is a projection onto the portion of a change that maintains the constraint.
This generalizes. For a symmetric matrix it's particularly simple, as unlike the unit vector case, the projection is always the same, not depending on the state being varied. We now want to fully describe this for the symmetric matrix case. We'll take the output matrix to be $H$ to clarify notation slightly. Obviously, the output changes in $h_{i,j}$ will have to be some linear combination of the input changes to both $g_{i,j}$ and $g_{j,i}$. In particular notice the case of an actual symmetric change. There both have changed the same amount, and that's the amount we want. For a symmetric change, $\mathrm{d} h_{i,j} = \alpha\mathrm{d}g_{i,j} + (1 - \alpha)\mathrm{d}g_{j,i}$ works for any value of alpha. The only symmetric way to get this is a contribution of $1/2$ from each.
I.e. we want $$\frac {\partial h_{i,j}}{\partial g_{k,l}} = \frac{1}{2} (\delta_{i,k}\delta_{j,l} + \delta_{i,l} \delta_{j,k} )$$
But now we have a problem because the nonzero off-diagonal derivatives equal 1/2
Why is this a problem? For an actual symmetric change it recovers the right output, because both inputs have changed the same amount.
Alternate ways truly keeping things in the realm of partial derivatives include:
If the symmetric matrix is just being considered on it's own, it's perfectly possible to take $g_{i,j}$ with $i \le j$ as "basis" elements. You no longer have the structure of a matrix for the derivative, but that may be okay -- if you're just trying to solve for a stationary point, it works fine. This will end up with less symmetric expressions though.
Sometimes you can treat a constrained system as a transform of an unconstrained system. Here, in the specific case of a symmetric matrix, we can treat derivatives of the symmetrized form of a matrix: $H = \operatorname{Sym}(G) = (G + G^T) / 2$. Then $\partial h_{i,j} /\partial g_{k,l} = (\delta_{i,k}\delta_{j,l} + \delta_{i,l} \delta_{j,k} ) / 2$, with nothing confusing at all, recovering what we did above. The point of view is slightly different, as above we worked directly with the changes, while here we just powered through the algebra of symmetrization.
Best Answer
$ \def\bbR#1{{\mathbb R}^{#1}} \def\l{\Lambda}\def\o{{\tt1}}\def\p{\partial} \def\L{\left}\def\R{\right} \def\LR#1{\L(#1\R)} \def\BR#1{\Big(#1\Big)} \def\bR#1{\big(#1\big)} \def\skew#1{\operatorname{skew}\LR{#1}} \def\cayley#1{\operatorname{cayley}\LR{#1}} \def\trace#1{\operatorname{Tr}\LR{#1}} \def\qiq{\quad\implies\quad} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} $Use an unconstrained matrix variable $U$ to construct the orthogonal matrix $P$ as follows $$\eqalign{ S &= \skew{U} \;\;\;\doteq\; \tfrac 12\LR{U-U^T} &\qiq S^T=-S \\ P &= \cayley{S} \;\doteq\; \LR{I+S}^{-1}\LR{I-S} &\qiq P^TP=I \\ dP &= -\LR{I+S}^{-1}dS\LR{I+P} &\qiq dS=\skew{dU} \\ }$$ Then use Golden_Ratio's gradient (with respect to $P$) to write the differential of the function, perform a change of variables from $dP\to dU,\,$ and recover the unconstrained gradient $$\eqalign{ df &= 2\LR{yy^TP\l}:dP \\ &= -2\LR{yy^TP\l}:\LR{\LR{I+S}^{-1}dS\LR{I+P}} \\ &= -2\LR{\LR{I-S}^{-1}\LR{yy^TP\l}\LR{I+P^T}}: {dS} \\ &= -2\LR{\LR{I-S}^{-1}\LR{yy^TP\l}\LR{I+P^T}}: \skew{dU} \\ &= -2\skew{\LR{I-S}^{-1}\LR{yy^TP\l}\LR{I+P^T}}:dU \\ \grad{f}{U} &= -2\skew{\LR{I-S}^{-1}\LR{yy^TP\l}\LR{I+P^T}} \;\;\doteq\;\; G \\ }$$ Now $G$ can be used in your favorite unconstrained gradient-based algorithm to find the optimal $U_*$ after which it is trivial to calculate the corresponding optimal matrices $$\eqalign{ S_* &= \skew{U_*} \qiq P_* &= \LR{I+S_*}^{-1}\LR{I-S_*} \\\\ }$$
In the steps above, a colon is used to denote the Frobenius product, which is a concise notation for the trace $$\eqalign{ A:B &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; \trace{A^TB} \\ A:A &= \big\|A\big\|^2_F \\ }$$ The properties of the underlying trace function allow the terms in such a product to be rearranged in many different but equivalent ways, e.g. $$\eqalign{ A:B &= B:A \\ A:B &= A^T:B^T \\ C:\LR{AB} &= \LR{CB^T}:A = \LR{A^TC}:B \\ A:\skew{B} &= \skew{A}:B \\ }$$