It is as simple as a coordinate transformation, but you have to be a bit careful about which one you use.
Let’s look at what the chain rule says for each of the partial derivatives: $${\partial f\over\partial p} = {\partial f\over\partial x}{\partial x\over\partial p}+{\partial f\over\partial y}{\partial y\over\partial p}+{\partial f\over\partial z}{\partial z\over\partial p}$$ and similarly for the other two. The three identities can be combined into one:$$\pmatrix{{\partial f\over\partial p}\\{\partial f\over\partial q}\\{\partial f\over\partial r}} = \pmatrix{{\partial x\over\partial p}&{\partial y\over\partial p}&{\partial z\over\partial p}\\{\partial x\over\partial q}&{\partial y\over\partial q}&{\partial z\over\partial q}\\{\partial x\over\partial r}&{\partial y\over\partial r}&{\partial z\over\partial r}}\pmatrix{{\partial f\over\partial x}\\{\partial f\over\partial y}\\{\partial f\over\partial z}} = {\partial(x,y,z)\over\partial(p,q,r)}^T \nabla f.$$ (I’ve suppressed the points at which these partials are evaluated to reduce clutter.)
In your case, the coordinate transformation is linear, so its Jacobian matrix is the transformation matrix itself. We want the matrix for the $(p,q,r)\mapsto(x,y,z)$ direction, which has the basis vectors $\vec p$, $\vec q$ and $\vec r$ as its columns, giving $$\pmatrix{\vec p^T\\\vec q^T\\\vec r^T} \pmatrix{{\partial f\over\partial x}\\{\partial f\over\partial y}\\{\partial f\over\partial z}}.$$
To put it another way, under point transformation given by $\mathbf p'=M\mathbf p$ for some invertible matrix $M$, $\nabla f$ transforms as $M^{-T}(\nabla f)$—the gradient at a point is a covariant vector. This makes sense since the gradient is a normal to level surfaces of $f$.
A similar rule applies to the Hessian $H_f$. This matrix represents a quadratic form, so under the point transformation $\mathbf p'=M\mathbf p$ it transforms as $M^{-T}H_fM^{-1}$.
The total potential energy is a function $U: \left(\Bbb{R}^3\right)^N \to \Bbb{R}$. Where the physical idea is that the $i^{th}$ copy of $\Bbb{R}^3$ tells us the position $(x_i, y_i, z_i)$ of the $i^{th}$ particle. So, writing something like $\mathbf{F}_i = - \nabla_{\mathbf{r}_i}U$ means: $\mathbf{F}_i : \left(\Bbb{R}^3\right)^N \to \Bbb{R}^3$ is the vector valued function defined as
\begin{align}
\mathbf{F}_i &= - \left( \dfrac{\partial U}{\partial x_i}\, \mathbf{e}_1 + \dfrac{\partial U}{\partial y_i}\, \mathbf{e}_2 + \dfrac{\partial U}{\partial z_i}\, \mathbf{e}_3\right) \\
&\equiv - \left(\dfrac{\partial U}{\partial x_i}, \dfrac{\partial U}{\partial y_i}, \dfrac{\partial U}{\partial z_i} \right),
\end{align}
where I use $\equiv$ to mean they're the same thing, expressed in different notation.
In other words, the force $\mathbf{F}_i$ on the $i^{th}$ particle is obtained by differentiating the total potential energy with respect to the $3$ cartesian coordinates of that particle.
Edit:
The short answer to the question in the comments is that "no, $\mathbf{F}_a = - \nabla_{\mathbf{r}_a}(U_{\text{total}})$" is a mathematical statement which needs to be proven and is not a "law". Here, we crucially make use of the fact that the potentials $U_{ij}$ depend only on $|\mathbf{r}_i - \mathbf{r}_j|$.
By definition, the total force on $a^{th}$ particle, due to all other particles is
\begin{align}
\mathbf{F}_a = \sum_{j \neq a} \mathbf{F}_{\text{$j$ on $a$}}
\end{align}
What is $\mathbf{F}_{\text{$j$ on $a$}}$? Well, you simply take the potential energy $U_{aj}$ and differentiate with respect to $\mathbf{r}_a$, i.e $\mathbf{F}_{\text{$j$ on $a$ }} = - \nabla_{a}(U_{aj})$ (for ease of typing, I use $\nabla_a$ to mean $\nabla_{\mathbf{r_a}}$, which I defined above to mean differentiation with respect to $x_a, y_a, z_a$). Hence,
\begin{align}
\mathbf{F}_a = \sum_{j \neq a} \mathbf{F}_{\text{$j$ on $a$}} = - \sum_{j \neq a} \nabla_{a}(U_{aj}) \tag{$*$}
\end{align}
I now claim that this is also equal to $-\nabla_a U_{\text{total}}$. To prove this, note first of all that $U_{ij} = U_{ji}$ and that $\nabla_{a}(U_{ij}) = 0$ if $a \notin \{i,j\}$ (because the potential energy depends only on the distance between the $i$ and $j$ particles, so clearly it doesn't depend on some other $\mathbf{r}_a$).
So, we have
\begin{align}
\nabla_a(U_{\text{total}}) &= \nabla_a\left( \sum_{i=1}^n \sum_{j>i} U_{ij}\right)
\end{align}
Now, we shall split the sum over $i$ into 3 pieces: $i=a$, $i<a$ and $i>a$. Then, we get:
\begin{align}
\nabla_a(U_{\text{total}}) &= \sum_{j>a} \nabla_a(U_{aj}) + \sum_{i<a} \sum_{j>i} \nabla_a(U_{ij}) + \sum_{i>a} \sum_{j>i} \nabla_a(U_{ij})
\end{align}
Now, recall the above mentioned property that $\nabla_a(U_{ij}) = 0$ if $a \notin \{i,j\}$. So, the only non-zero terms in the above summation are:
\begin{align}
\nabla_a(U_{\text{total}}) &= \sum_{j>a} \nabla_a(U_{aj}) + \sum_{i<a} \nabla_a(U_{ia}) + 0 \\
&= \sum_{j \neq a} \nabla_a(U_{aj}), \tag{$**$}
\end{align}
where in the last line I used the fact that $U_{ia} = U_{ai}$, and I renamed summation indices and combined everything. So, if you combine $(*)$ and $(**)$ then you immediately see that
\begin{align}
\mathbf{F}_a &= - \nabla_a(U_{\text{total}}).
\end{align}
Best Answer
Your formula for gradient works for a function that depends on position $(x,y,z)$. But, position of what? In this situation, there are two particles, each with its own $(x,y,z)$, so your formula doesn't make sense.
The potential $U$ depends on the positions of both particles. $\nabla_1$ just means he's keeping the second particle fixed, to take the derivative with respect to the first particle's position. It's a form of "partial derivative". (I haven't read the book, though.)
A directional derivative is a scalar, but this gradient is a vector (as any force must be).
The force equation is saying that the first particle accelerates in the direction that would decrease the potential energy of the system.