Define some variables for convenience
$$\eqalign{
P &= {\rm Diag}(\beta) \cr
B &= cP^{-1} \cr
b &= {\rm diag}(B) \cr
S &= A+B \cr
M &= AS^{-1}A \cr
}$$
all of which are symmetric matrices, except for $b$ which is a vector.
Then the function and its differential can be expressed in terms of the Frobenius (:) product as
$$\eqalign{
f &= {\rm tr}(M) \cr
&= A^2 : S^{-1} \cr\cr
df &= A^2 : dS^{-1} \cr
&= -A^2 : S^{-1}\,dS\,S^{-1} \cr
&= -S^{-1}A^2S^{-1} : dS \cr
&= -S^{-1}A^2S^{-1} : dB \cr
&= -S^{-1}A^2S^{-1} : c\,dP^{-1} \cr
&= c\,S^{-1}A^2S^{-1} : P^{-1}\,dP\,P^{-1} \cr
&= c\,P^{-1}S^{-1}A^2S^{-1}P^{-1} : dP \cr
&= c\,P^{-1}S^{-1}A^2S^{-1}P^{-1} : {\rm Diag}(d\beta) \cr
&= {\rm diag}\big(c\,P^{-1}S^{-1}A^2S^{-1}P^{-1}\big)^T d\beta \cr
}$$
So the derivative is
$$\eqalign{
\frac{\partial f}{\partial\beta} &= {\rm diag}\big(c\,P^{-1}S^{-1}A^2S^{-1}P^{-1}\big) \cr
&= \frac{1}{c}\,{\rm diag}\big(BS^{-1}A^2S^{-1}B\big) \cr
&= \Big(\frac{b\circ b}{c}\Big)\circ{\rm diag}\big(S^{-1}A^2S^{-1}\big) \cr\cr
}$$
which uses Hadamard ($\circ$) products in the final expression. This is the same as joriki's result, but with more details.
In general, the derivative of a function $f : \mathbb{R}^n \to \mathbb{R}^m$ at a point $p \in \mathbb{R}^n$, if it exists, is the unique linear transformation $Df(p) \in L(\mathbb{R}^n,\mathbb{R}^m)$ such that
$$
\lim_{h \to 0} \frac{\|f(p+h)-f(p)-Df(p)h\|}{\|h\|} = 0;
$$
the matrix of $Df(p)$ with respect to the standard orthonormal bases of $\mathbb{R}^n$ and $\mathbb{R}^m$, called the Jacobian matrix of $f$ at $p$, therefore lies in $M_{m \times n}(\mathbb{R})$.
Now, suppose that $m=1$, so that $f : \mathbb{R}^n \to \mathbb{R}$. Then if $f$ is differentiable at $p$, $Df(p) \in L(\mathbb{R}^n,\mathbb{R}) = (\mathbb{R}^n)^\ast$ is a functional, and hence the Jacobian matrix, as you point out, lies in $M_{1 \times n}(\mathbb{R})$, i.e., is a row vector. However, by the Riesz representation theorem, $\mathbb{R}^n \cong (\mathbb{R}^n)^\ast$ via the map that sends a vector $x \in \mathbb{R}^n$ to the functional $y \mapsto \left\langle y,x \right\rangle$. Hence, if $f$ is differentiable at $p$, then the gradient of $f$ at $p$ is the unique (column!) vector $\nabla f(p) \in \mathbb{R}^n$ such that
$$
\forall h \in \mathbb{R}^n, \quad Df(p)h = \left\langle \nabla f(p),h\right\rangle;
$$
in particular, if you unpack definitions, you'll find that the Jacobian matrix of $f$ at $p$ is precisely $\nabla f(p)^T$.
Best Answer
Since $f$ is differentiable at $a$, then,$$\lim_{w\to0}\frac{\bigl\lVert f(a+w)-f(a)-Df(a)(w)\bigr\rVert}{\lVert w\rVert}=0.\tag1$$So, consider the vectors $w$ of the form $tv$, with $t\in\mathbb R$. You deduce then from $(1)$ that$$\lim_{t\to0}\frac{\bigl\lVert f(a+tv)-f(a)-Df(a)(tv)\bigr\rVert}{\lVert tv\rVert}=0.\tag2$$Clearly, $(2)$ is equivalent to$$\lim_{t\to0}\frac{f(a+tv)-f(a)-tDf(a)(v)}t=0.$$In other words,$$\lim_{t\to0}\frac{f(a+tv)-f(a)}t=Df(a)(v).\tag3$$But the LHS of $(3)$ is the directional derivative of $f$ in the direction of $v$. And, since$$\begin{bmatrix}{\frac{\partial f_{1}}{\partial x_{1}}(a)} & {\cdots} & {\frac{\partial f_{1}}{\partial x_{n}}(a)} \\ \vdots & \ddots & \vdots \\ {\frac{\partial f_{m}}{\partial x_{1}}(a)} & \cdots & {\frac{\partial f_{m}}{\partial x_{n}}(a)}\end{bmatrix}$$is the matrix of $Df(a)$ with respect to the canonical basis, the RHS of $(3)$ is$$\begin{bmatrix}{\frac{\partial f_{1}}{\partial x_{1}}(a)} & {\cdots} & {\frac{\partial f_{1}}{\partial x_{n}}(a)} \\ \vdots & \ddots & \vdots \\ {\frac{\partial f_{m}}{\partial x_{1}}(a)} & \cdots & {\frac{\partial f_{m}}{\partial x_{n}}(a)}\end{bmatrix}.v.$$