In single varible calculs it was a little simpler.
$\frac{dy}{dx} = \frac{dy}{du}\frac{du}{dx}$
While these differential opperators are not fractions, this notation allows you to treat them like fractions. When we get to multivariate, the idea is the same, but it is a tad more complicated.
$\frac{\partial}{\partial r} = \frac{\partial}{\partial x}\frac{\partial x}{\partial r} + \frac{\partial}{\partial y}\frac{\partial y}{\partial r}$
r and z are functions of x and y, and so when we apply the chain rule, both the x's and the y's need to express themselves.
Let's recap some things: For general $f=(f_1,\ldots,f_m):U\subseteq\mathbb{R}^n\to\mathbb{R}^m$ (where $U$, the domain of $f$, is open in $\mathbb{R}^n$) and $x\in U$, $Df(x)$ is the Jacobian (or rather the linear transformation associated to it), given by
$$Df(x)=\left[\frac{\partial f_i}{\partial x_j}(x)\right]_{\substack{i=1,\ldots,m\\j=1,\ldots,n}}$$
(wherever this makes sense).
From this definition, the following are obvious:
When $m=1$, $Df(x)$ is simply a row, and in fact it is equal to $\nabla f(x)$, the gradient of $f$ at $x$.
When $n=1$, $Df(x)$ is a column, where each entry is simply the derivative of $f_j$ at $x$.
When $m=n=1$, $Df(x)$ is a number, equal to $f'(x)$.
Now recall the Chain Rule:
For $f:U\subseteq \mathbb{R}^n\to V\subseteq\mathbb{R}^m$ and $g:V\subseteq\mathbb{R}^m\to\mathbb{R}^p$,
$$D(g\circ f)(x)=Dg(f(x))Df(x),$$
where the RHS is simply product of matrices.
Now to your problem: to use the chain rule, you have to see $f$ as a composition of two simpler functions, which are differentiable everywhere. If we put $h:\mathbb{R}^n\to(0,\infty)$ as $h(x)=\Vert x\Vert^2$, and $k:(0,\infty)\to\mathbb{R}$ by $k(t)=\frac{t^2}{1+t}$, we have $f=k\circ h$, and it is easier to calculate $Dh$ and $Dk=k'$.
I'll leave the details to you, but here's what you should get: For $x=(x_1,\ldots,x_n)$, use the definition of $Dh(x)$ to find $Dh(x)=2[x_1\ x_2\cdots x_n]$.
For $k$, one-variable calculus gives $k'(t)=\frac{t^2+2t}{(1+t)^2}$.
By the chain rule,
$$Df(x_1,\ldots,x_n)=2\frac{(\Vert x\Vert^4+2\Vert x\Vert^2)}{(1+\Vert x\Vert^2)^2}[x_1\cdots x_n]$$
Best Answer
Define the vector $$z=A^Tx-b$$ Write the function in terms of this new vector. Then find its differential and gradient. $$\eqalign{ y &= z^Tz \cr dy &= 2z^Tdz = 2z^T(A^Tdx) = (2Az)^Tdx \cr \frac{\partial y}{\partial x} &= 2Az = 2A(A^Tx-b) \cr\cr }$$ The symbol ${\mathbb R}^{m\times n}$ denotes a matrix of real numbers with $m$ rows and $n$ columns.