For a smooth $f:\mathbb{R}^n\to\mathbb{R}^m$, you have $df:\mathbb{R}^n\to\mathcal{L}(\mathbb{R}^n,\mathbb{R}^m)$
Being differentiable is equivalent to:
$$
f(x+h)=f(x)+df(x)\cdot h+o(\|h\|)
$$
In your case, $f(x)=\langle x,x \rangle_G$ and $m=1$, hence differential at $x$, $df(x)$ is in $\mathcal{L}(\mathbb{R}^n,\mathbb{R})$. It's a linear form.
Let's be more explicit:
\begin{align*}
f(x+h)=& \langle x+h,x+h \rangle_G \\
=& \underbrace{\langle x,x \rangle_G}_{f(x)} + \underbrace{2\langle x,h \rangle_G }_{df(x)\cdot h}+ \underbrace{\langle h,h \rangle_G}_{\in o(\|h\|)}\\
\end{align*}
Hence your differential is defined by
$$
df(x)\cdot h = 2\langle x,h \rangle_G = (2x^tG)h
$$
where $2x^tG=\left(\partial_{x_1} f,\dots,\partial_{x_n} f\right)$ is your "row" vector.
Note that, because $m=1$, you can also use a vector $\nabla f(x)$ to represent $df(x)$ using the canonical scalar product. This vector is by definition the gradient of $f$:
$$
df(x)\cdot h = \langle \nabla f(x),h \rangle = \langle 2Gx,h \rangle
$$
where $\nabla f(x)=2Gx=\left(\begin{array}{c}\partial_{x_1} f \\ ... \\\partial_{x_n} f\end{array}\right)$. This is your "column" vector.
Yes, it is true, because
\begin{aligned}
ABA
&=A^{1/2}\left(A^{1/2}BA^{1/2}\right)A^{1/2}\\
&\preceq A^{1/2}\bigl(\rho\left(A^{1/2}BA^{1/2}\right)I\bigr)A^{1/2}\\
&=\rho\left(A^{1/2}BA^{1/2}\right)A\\
&=\rho(BA)A\\
&\preceq\|BA\|_FA.
\end{aligned}
As shown in the above, your inequality can be sharpened to $\langle Ax,BAx\rangle\le\rho(BA)\langle Ax,x\rangle$.
Best Answer
Let $\mathbf{C}= \mathbf{X} \mathbf{A}^T (\mathbf{A}\mathbf{X}\mathbf{A}^T)^{-1} - \mathbf{A}^T (\mathbf{A}\mathbf{A}^T)^{-1}$ and $\mathbf{D} = \mathbf{A}\mathbf{X}\mathbf{A}^T$
Using these notations, so that we can write $\phi = \| \mathbf{C} \|_F^2 = \mathbf{C}:\mathbf{C}$
It follows \begin{eqnarray} d\phi &=& 2 \mathbf{C}:d\mathbf{C} \\ &=& 2 \mathbf{C}:(d\mathbf{X}) \mathbf{A}^T \mathbf{D}^{-1} - 2 \mathbf{C}:\mathbf{X} \mathbf{A}^T \mathbf{D}^{-1}(d\mathbf{D})\mathbf{D}^{-1}\\ &=& 2 \mathbf{C}\mathbf{D}^{-T} \mathbf{A}:d\mathbf{X} - 2 \mathbf{D}^{-T}\mathbf{A}\mathbf{X}^T\mathbf{C} \mathbf{D}^{-T}: \mathbf{A}(d\mathbf{X})\mathbf{A}^T \end{eqnarray} Finally the gradient simplifies into $$ 2 (\mathbf{I} - \mathbf{A}^T \mathbf{D}^{-T}\mathbf{A}\mathbf{X}^T)\mathbf{C} \mathbf{D}^{-T} \mathbf{A} $$