The most general approach would be to model the dipole as a compactly supported charge distribution $\rho(x)$ that integrates to zero (i.e. no net charge). So if the dipole is "located at" point $y$, then the charge density at the point $x + y$ is $\rho(x)$. (Note that $x$ and $y$ stand for vectors. I omit the arrows for brevity.)
We'll assume that the electric field is also compactly supported. This gives an obvious way to define the zero point of the potential: $U = 0$ when the dipole is far away enough that it's disjoint from the electric field. Let's say the initial position is $y_1$, so $U(y_1) = 0$.
We can now move the dipole toward another point $y_2$ while keeping its orientation fixed. In moving it from $y$ to $y + dy$, the infinitesimal increment of work performed is
$$ dU = - dy \cdot \int \rho(x) E(y + x) \, dx $$
The potential energy at the final position of the origin is therefore
$$ U(y_2) = - \int_{y_1}^{y_2} \int \rho(x) E(y + x) \, dx \cdot dy $$
Note that the integral over $y$ is a line integral (so $dy$ is a vector) while the integral over $x$ is a space integral.
At this point we can approximate $E(y + x) \approx E(y) + x \cdot \nabla E(y)$ as mentioned by DaniH, obtaining
$$ U(y_2) \approx -\int_{y_1}^{y_2} \int \rho(x) E(y) \, dx \cdot dy - \int_{y_1}^{y_2} \int \rho(x) [x \cdot \nabla E(y)] \, dx \cdot dy$$
Having eliminated $x + y$, we can now factor both terms:
$$ U(y_2) \approx -\left[ \int \rho(x) \, dx\right] \left[\int_{y_1}^{y_2} E(y) \, d\ell\right] - \left[ \int x \rho(x) \, dx \right] \cdot \left[\int_{y_1}^{y_2} \nabla E(y) \cdot d\ell\right]$$
We assumed the total charge is zero (i.e., no monopole moment), so the first term vanishes. In the second term, the first factor is just the total dipole moment $p$, and the second factor just integrates to $E(y_2) - E(y_1)$. But we assumed $E(y_1) = 0$, so the final result is just $U(y_2) \approx - p \cdot E(y_2)$.
If we add back in the higher order terms from the Taylor expansion, then in general this formula will not be exact. These higher order terms contribute to the final result terms of the form $-\frac{1}{k!} \int x^k \rho(x) \, dx \cdot \int \nabla^k E(y) \, dy$. In order to justify treating these terms as negligible, you need to know something about the variation in $E$. In a typical scenario, each term will diminish by a factor of about $k L/M$, where $L$ is the length scale over which $E$ varies appreciably, and $M$ is the size of the dipole. You can make this precise but physicists typically leave these details to the mathematicians.
Your magnetic field $\vec{B} = z \hat{z}$ is not a valid magnetic field, since it has $\vec{\nabla} \cdot \vec{B} = 1 \neq 0$. If you want to have the field have this $z$-dependence, you must also have a non-trivial field in the $xy$-plane. And it's this latter component of the field that causes the force.
For example, if the field is radially symmetric about the center of the loop, you could have
$$
\vec{B} = z \hat{z} - \frac{1}{2} r \hat{r}
$$
where $r$ is the cylindrical coordinate (distance from the $z$-axis). This field does obey $\vec{\nabla} \cdot \vec{B} = 0$; and the force on the loop due to this new field is
$$
\vec{F} = I \oint d\vec{l} \times \vec{B} = I \int_0^{2\pi} (r d \theta \, \hat{\theta}) \times \left(- \frac{1}{2} r \, \hat{r} \right) = \frac{1}{2} I r^2 \hat{z} \int_0^{2\pi} d \theta = I \pi r^2 \hat{z} = I A \hat{z}
$$
as expected.
More generally, the Lorentz force law says that the force on any current configuration is
$$
\vec{F} = \int \vec{J} \times \vec{B} \, d \tau.
$$
For a "small" current configuration it is possible to show that this is equal to the magnetic dipole force formula $\vec{\nabla} (\vec{m}\cdot\vec{B})$; but one step in the derivation relies on the fact that $\vec{\nabla} \cdot \vec{B} = 0$ (as well as $\vec{\nabla} \cdot \vec{J} = 0$.) I believe this derivation is in Zangwill's Modern Electrodynamics if you want to see it, but my recollection is that it's pretty hairy algebraically.
Best Answer
If you bring in the dipole from infinity, you can first rotate it while it's at infinity (where, by assumption, the field vanishes, and thus its potential energy is still zero), and then translate it to its final location. In doing so against a force $\nabla (m \cdot B)$, you give it potential energy in the amount $-m \cdot B$.
Alternatively, you can assume that at all times, the dipole is kept perpendicular to the field as it is being brought in from infinity. If that's the case, the force is zero this whole time. Lastly, you can rotate it into its final orientation while keeping its position constant. Now you only have to do work against the torque.
It's the same energy, which can be computed easily using one of the two approaches above. You could also consider a more complicated path where work is being done against both force and torque simultaneously, but that just makes your life harder. But if the dipole does take such a path, the final result must still be the same.