The force applied to a point dipole with dipole momentum $\vec{p}$ is
$$
\vec{F} = (\vec{p} \cdot \vec\nabla) \vec{E}
$$
In Cartesian coordinates that is
$$
F_i = \sum_j p_j \frac{\partial}{\partial x_j} E_i
$$
But in spherical coordinates it is not the same.
There is no field components along $\vec{\theta}$, but there is a gradient of field components along this direction since the direction of the vector changes.
In order to convert this expression to spherical coordinates one should to use tensor analysis.
In all following expressions the summation over repeating indices is assumed.
$$
T^{\;ji}_t = p^j \frac{\partial}{\partial x^t} E^i
$$
$$
F^{\;i} = T^{\;ji}_t \delta^t_j
$$
Let Cartesian coordinates be $x^1, x^2, x^3$ and spherical coordinates be $y^1, y^2, y^3$, then
$$
T{\,}'^{j'i'}_{t'}(y) =
\frac{\partial y^{j'}}{\partial x^j}
\frac{\partial y^{i'}}{\partial x^i}
\frac{\partial x^t}{\partial y^{t'}}
T^{\;ji}_{t}\bigl(x(y)\bigr)
$$
One should calculate the force in spherical coordinates as
$$
F^{\,i} = T{\,}'^{ji}_{t}(y) \delta^t_j \quad \text{(correct)}
$$
while you have used the tensor without prime, i.e.
$$
F^{\,i} = T{\,}^{ji}_{t}(y) \delta^t_j \quad \text{(wrong)}
$$
Remember that a basis of a vector space only needs to (1) span the vector space, and (2) be linearly independent. In particular, a basis does not have to be orthogonal, and it certainly doesn't have to be normalized. And one of the most common types of basis (a coordinate basis) is usually not normalized.
You're confused because you usually see the metric tensor in spherical coordinates given as
\begin{equation}
\mathbf{g} =
\begin{pmatrix}
1 & 0 & 0 \\
0 & r^2 & 0 \\
0 & 0 & r^2 \sin^2\theta
\end{pmatrix}.
\end{equation}
This is the metric with respect to the coordinate basis, whereas you've (correctly) written the metric with respect to the orthonormalized vector basis — and it's very important to remember the distinction between those types of bases. I'll explain.
Let's write the coordinate basis vectors as
\begin{equation}
\mathbf{r}, \boldsymbol{\theta}, \boldsymbol{\phi}.
\end{equation}
(Note that I'm using a bold font to indicate that these are vectors, but I'm not putting hats on them, for reasons that will become clear soon.) These vectors represent the amount you would move through the space if you changed the corresponding coordinate by a certain amount. For example, if $\mathbf{p}(r, \theta, \phi)$ is the position vector to the point with spherical coordinates $r, \theta, \phi$, then those coordinate basis vectors are defined as
\begin{align}
\mathbf{r} &= \frac{\partial \mathbf{p}} {\partial r} \\
\boldsymbol{\theta} &= \frac{\partial \mathbf{p}} {\partial \theta} \\
\boldsymbol{\phi} &= \frac{\partial \mathbf{p}} {\partial \phi}.
\end{align}
To relate that back to your basis given in Cartesian components, remember that
\begin{equation}
\mathbf{p} = r\sin\theta\cos\phi\, \hat{\mathbf{x}} + r\sin\theta\sin\phi\, \hat{\mathbf{y}} + r\cos\theta\, \hat{\mathbf{z}},
\end{equation}
which we can differentiate to find
\begin{align}
\mathbf{r} &= \sin\theta\cos\phi\, \hat{\mathbf{x}} + \sin\theta\sin\phi\, \hat{\mathbf{y}} + \cos\theta\, \hat{\mathbf{z}} \\
\boldsymbol{\theta} &= r\cos\theta\cos\phi\, \hat{\mathbf{x}} + r\cos\theta\sin\phi\, \hat{\mathbf{y}} - r\sin\theta\, \hat{\mathbf{z}} \\
\boldsymbol{\phi} &= -r\sin\theta\sin\phi\, \hat{\mathbf{x}} + r\sin\theta\cos\phi\, \hat{\mathbf{y}}.
\end{align}
Using these expressions, it's a simple exercise to see that we have
\begin{align}
\mathbf{r} \cdot \mathbf{r} &= 1 \\
\boldsymbol{\theta} \cdot \boldsymbol{\theta} &= r^2 \\
\boldsymbol{\phi} \cdot \boldsymbol{\phi} &= r^2 \sin^2 \theta.
\end{align}
So this basis is not orthonormal — and that's where the "usual" metric components come from, which is why the metric isn't just the identity as you expected. In fact, usually the only type of coordinates that lead to orthonormal basis vectors is a Cartesian coordinate systems (though even Cartesian coordinates are not orthonormal in nontrivial geometries).
On the other hand, a nearly identical simple exercise shows that your basis $(\mathbf{e}_r, \mathbf{e}_\theta, \mathbf{e}_\phi)$ is orthonormal. In fact, comparing our expressions in the Cartesian basis, we see that
\begin{align}
\mathbf{r} &= \mathbf{e}_r \\
\boldsymbol{\theta} &= r\, \mathbf{e}_\theta \\
\boldsymbol{\phi} &= r\sin\theta\, \mathbf{e}_\phi.
\end{align}
In an orthonormal basis, the metric is — essentially by definition — just the identity matrix, which is what you found.
Best Answer
The easiest derivation is probably to start with the potential, and then calculate the electric field as the gradient of that potential.
Consider a dipole at the origin aligned along the z-axis, with two point charges of charge $ \pm q $ positioned at $z = \pm\frac{a}{2}$. By symmetry, the potential will be independent of the azimuthal angle, so we only need to consider the potential at the coordinates $\left( r, \theta \right)$. Since we are trying to find the field around a point dipole, we can assume $r >> a$.
The distance of this point to the two charges is given by the cosine rule as $r_\pm^2 = \frac{a^2}{4} + r^2 \mp ar \cos \theta$.
The potential at this point is then given by summing the potential from the two point charges, which is $V\left(r, \theta\right) = \frac{q}{4\pi\varepsilon_0}\left(\left(\frac{a^2}{4} + r^2-ar\cos\theta\right)^{-\frac{1}{2}}-\left(\frac{a^2}{4} + r^2+ar\cos\theta\right)^{-\frac{1}{2}}\right)$. The approximation $r >> a$ suggests that we can Taylor expand this in $\frac{a}{r}$, and that is what we will do.
Taylor expanding gives $V\left(r, \theta\right) = \frac{qa \cos \theta}{4\pi\varepsilon_0 r^2}$. Any higher-order terms can always be ignored, because this is an ideal point dipole.
Now all that's left is finding the electric field from this potental. We know that $ \vec E = - \nabla V$. Taking the gradient of the dipole potential gives $\vec E = \frac{qa}{4 \pi \varepsilon_0 r^3} \left( 2 \cos \theta \hat r + \sin \theta \hat \theta \right) $.
This is nice, but we'd like an expression in terms of the dipole moment; there aren't really any point charges separated by a physical distance, that was just scaffolding to get us this far. Fortunately, the way we've defined the coordinate axes gives us a nice expression for the dipole moment: $\vec p = qa \hat z$, or, in spherical coordinates, $\vec p = qa\left( \cos \theta \hat r - \sin \theta \hat \theta\right)$. So $ \vec p . \hat r = p \cos \theta = qa \cos \theta $.
Putting this all into the expression for the electric field gives us $ \frac{1}{4 \pi \varepsilon_0 r^3} \left( 3p \cos \theta \hat r - \vec p \right)$, just as planned.