Both formulas are equivalent, if you are in the electrostatic approximation and your dipole vector does not depend on the position $\mathbf{r}$.
Let's consider the expression $\mathbf{F}=\nabla_{\mathbf{r}}(\mathbf{p} \cdot \mathbf{E})$ which can be easily obtained from the potential energy function
$U=-\mathbf{p} \cdot \mathbf{E}$
and its relation with the force $\mathbf{F}=\nabla_\mathbf{r} U$. Now, recall the vector identity
$\nabla_\mathbf{r}(\mathbf{a}\cdot \mathbf{b})= (\mathbf{a} \cdot \nabla_\mathbf{r}) \mathbf{b}+(\mathbf{b} \cdot \nabla_\mathbf{r}) \mathbf{a} + \mathbf{a} \times (\nabla_\mathbf{r} \times \mathbf{b})+ \mathbf{b} \times (\nabla_\mathbf{r} \times \mathbf{a})$
for $\mathbf{a}=\mathbf{a}(\mathbf{r})$ and $\mathbf{b}=\mathbf{b}(\mathbf{r})$ two arbitrary vectors. For $\mathbf{p}=\mathbf{a} \neq \mathbf{p}(\mathbf{r})$ [independent of the position] and $\mathbf{b}=\mathbf{E}(\mathbf{r}$) we have
$\nabla_\mathbf{r}(\mathbf{p}\cdot \mathbf{E})= (\mathbf{p} \cdot \nabla_\mathbf{r}) \mathbf{E}+(\mathbf{E} \cdot \nabla_\mathbf{r}) \mathbf{p} + \mathbf{p} \times (\nabla_\mathbf{r} \times \mathbf{E})+ \mathbf{E} \times (\nabla_\mathbf{r} \times \mathbf{p})$
As the dipole vector does not depend on the position we can drop the second and the fourth terms. In the electrostatic approximation, Faraday's law reads $\partial_t \mathbf{B}=\mathbf{0}\Leftrightarrow \nabla_\mathbf{r} \times \mathbf{E}(\mathbf{r})=\mathbf{0} $ [this is known as ''Carn's law''] so that the electric field is irrotational and the curl vanishes. Then we can drop the third term and
$\nabla_\mathbf{r}(\mathbf{p}\cdot \mathbf{E})= (\mathbf{p} \cdot \nabla_\mathbf{r}) \mathbf{E}$
so that your definitions agree.
Imagine a electric dipole placed in a uniform field with an angle of π/2 to said field. The electric field will put a torque on the dipole, reducing the angle to 0.
Almost. There will be a torque which will begin to rotate the dipole toward a zero angle. This results in a decrease in potential energy and an increase in kinetic energy and angular momentum. When the dipole reaches zero, however, it doesn't stop. It has angular momentum, so it will continue past zero. If the external electric field remains on, there will be a torque opposite the angular momentum which will cause the dipole to slow down and stop, then swing back toward zero again.
If you stop the electric field at any point, the dipole will continue to move. The system has whatever kinetic energy belongs to the rotation of the dipole, which of course will radiate and eventually stop.
When you turned on the field, you changed the potential energy of the system by +pE. The electrical potential energy formally was 0, and the kinetic energy was 0 for a total mechanical energy of 0. But the system was not constrained and could change position to have a lower potential energy. When the dipole reached the zero angle, the kinetic energy was equal to pE and the system potential energy, - pE for a total mechanical energy of 0.
Best Answer
The most general approach would be to model the dipole as a compactly supported charge distribution $\rho(x)$ that integrates to zero (i.e. no net charge). So if the dipole is "located at" point $y$, then the charge density at the point $x + y$ is $\rho(x)$. (Note that $x$ and $y$ stand for vectors. I omit the arrows for brevity.)
We'll assume that the electric field is also compactly supported. This gives an obvious way to define the zero point of the potential: $U = 0$ when the dipole is far away enough that it's disjoint from the electric field. Let's say the initial position is $y_1$, so $U(y_1) = 0$.
We can now move the dipole toward another point $y_2$ while keeping its orientation fixed. In moving it from $y$ to $y + dy$, the infinitesimal increment of work performed is
$$ dU = - dy \cdot \int \rho(x) E(y + x) \, dx $$
The potential energy at the final position of the origin is therefore
$$ U(y_2) = - \int_{y_1}^{y_2} \int \rho(x) E(y + x) \, dx \cdot dy $$
Note that the integral over $y$ is a line integral (so $dy$ is a vector) while the integral over $x$ is a space integral.
At this point we can approximate $E(y + x) \approx E(y) + x \cdot \nabla E(y)$ as mentioned by DaniH, obtaining
$$ U(y_2) \approx -\int_{y_1}^{y_2} \int \rho(x) E(y) \, dx \cdot dy - \int_{y_1}^{y_2} \int \rho(x) [x \cdot \nabla E(y)] \, dx \cdot dy$$
Having eliminated $x + y$, we can now factor both terms:
$$ U(y_2) \approx -\left[ \int \rho(x) \, dx\right] \left[\int_{y_1}^{y_2} E(y) \, d\ell\right] - \left[ \int x \rho(x) \, dx \right] \cdot \left[\int_{y_1}^{y_2} \nabla E(y) \cdot d\ell\right]$$
We assumed the total charge is zero (i.e., no monopole moment), so the first term vanishes. In the second term, the first factor is just the total dipole moment $p$, and the second factor just integrates to $E(y_2) - E(y_1)$. But we assumed $E(y_1) = 0$, so the final result is just $U(y_2) \approx - p \cdot E(y_2)$.
If we add back in the higher order terms from the Taylor expansion, then in general this formula will not be exact. These higher order terms contribute to the final result terms of the form $-\frac{1}{k!} \int x^k \rho(x) \, dx \cdot \int \nabla^k E(y) \, dy$. In order to justify treating these terms as negligible, you need to know something about the variation in $E$. In a typical scenario, each term will diminish by a factor of about $k L/M$, where $L$ is the length scale over which $E$ varies appreciably, and $M$ is the size of the dipole. You can make this precise but physicists typically leave these details to the mathematicians.