It's not the derivative with respect to a matrix really. It's the derivative of $f$ with respect to each element of a matrix and the result is a matrix.
Although the calculations are different, it is the same idea as a Jacobian matrix. Each entry is a derivative with respect to a different variable.
Same goes with $\frac{\partial f}{\partial \mu}$, it is a vector made of derivatives with respect to each element in $\mu$.
You could think of them as $$\bigg[\frac{\partial f}{\partial \Sigma}\bigg]_{i,j} = \frac{\partial f}{\partial \sigma^2_{i,j}} \qquad \text{and}\qquad \bigg[\frac{\partial f}{\partial \mu}\bigg]_i = \frac{\partial f}{\partial \mu_i}$$
where $\sigma^2_{i,j}$ is the $(i,j)$th covariance in $\Sigma$ and $\mu_i$ is the $i$th element of the mean vector $\mu$.
This is actually a generic issue with partial derivatives with respect to a constrained object, and parameterizations of that object that would allow violating the constraints. This also crops up with e.g. unit vectors, or orthogonal matrices.
A symmetric matrix only has n*(n+1)/2 degrees of freedom -- you cannot vary them separately, and it's not clear what it would mean to hold $g_{i,j}$ fixed while varying $g_{j,i}$ -- the usual explanation for partial derivatives. If you have a function defined along a curve in 3-space, there really isn't a partial derivative with respect to x, especially if some places along the curve x doesn't locally change.
What people generally want in cases like this is a way to use the normal formulas and have them work, at least when the "infinitesimal changes" keep the new object obeying the constraint. For example, the change in a symmetric matrix is a symmetric matrix, and the change in a unit vector must be orthogonal to the unit vector. These treatments usually involve some hideous and non-obvious abuses of notation, and are "really" moving to a parameterized, but constrained total derivative. Doing this for a symmetric matrix has $\mathrm{d}G = S\,\mathrm{d}t$, with $S$ any symmetric matrix.
The derivative of a unit vector with respect to itself is an example of this, though going through method 2 below. The end result of the "partial" derivative with respect to itself is a projection onto the portion of a change that maintains the constraint.
This generalizes. For a symmetric matrix it's particularly simple, as unlike the unit vector case, the projection is always the same, not depending on the state being varied. We now want to fully describe this for the symmetric matrix case. We'll take the output matrix to be $H$ to clarify notation slightly. Obviously, the output changes in $h_{i,j}$ will have to be some linear combination of the input changes to both $g_{i,j}$ and $g_{j,i}$. In particular notice the case of an actual symmetric change. There both have changed the same amount, and that's the amount we want. For a symmetric change, $\mathrm{d} h_{i,j} = \alpha\mathrm{d}g_{i,j} + (1 - \alpha)\mathrm{d}g_{j,i}$ works for any value of alpha. The only symmetric way to get this is a contribution of $1/2$ from each.
I.e. we want $$\frac {\partial h_{i,j}}{\partial g_{k,l}} = \frac{1}{2} (\delta_{i,k}\delta_{j,l} + \delta_{i,l} \delta_{j,k} )$$
But now we have a problem because the nonzero off-diagonal derivatives equal 1/2
Why is this a problem? For an actual symmetric change it recovers the right output, because both inputs have changed the same amount.
Alternate ways truly keeping things in the realm of partial derivatives include:
If the symmetric matrix is just being considered on it's own, it's perfectly possible to take $g_{i,j}$ with $i \le j$ as "basis" elements. You no longer have the structure of a matrix for the derivative, but that may be okay -- if you're just trying to solve for a stationary point, it works fine. This will end up with less symmetric expressions though.
Sometimes you can treat a constrained system as a transform of an unconstrained system. Here, in the specific case of a symmetric matrix, we can treat derivatives of the symmetrized form of a matrix: $H = \operatorname{Sym}(G) = (G + G^T) / 2$. Then $\partial h_{i,j} /\partial g_{k,l} = (\delta_{i,k}\delta_{j,l} + \delta_{i,l} \delta_{j,k} ) / 2$, with nothing confusing at all, recovering what we did above. The point of view is slightly different, as above we worked directly with the changes, while here we just powered through the algebra of symmetrization.
Best Answer
You made a symmetry assumption when you expanded your expression for $f$, but then didn’t account for it in your derivative.
Recall that the derivative of a symmetric matrix should be symmetric (prove this if you don’t see it), and therefore your expression
$$\frac{\partial f}{\partial A} = \frac{1}{2}\left(\frac{\partial f}{\partial A} + \left( \frac{\partial f}{\partial A}\right)^\intercal\right).$$
Applying this to your second expression directly yields the first.
Edit: Another way to see this is that both expressions are in fact your derivative. This is because both are the same linear mapping on symmetric matrices. This makes sense because your mapping is the derivative with respect to a symmetric matrix, which means the only relevant “direction vectors” it can act on are already symmetric.