The components that the formula $g^{ij} \partial_j f$ refers to are taken with respect to the natural tangent space basis induced by the coordinate system; these vectors are often denoted by $(\partial/\partial r, \partial/\partial \theta, \partial/\partial \phi)$, and they differ from the orthonormal frame $(\hat{r}, \hat{\theta}, \hat{\phi})$ by the usual normalization factors $1$, $r$, $r \sin\theta$, respectively.
EDIT: Let's think in terms of vector calculus. In that case, your manifold could be a surface in $\mathbf{R}^3$, or the whole space $\mathbf{R}^3$ (but described in a curvilinear coordinate system). The position vector of a point in the manifold is written as $\mathbf{r}(s,t)$ (for a parametrized surface) or $\mathbf{r}(r,\theta,\phi)$ (for the whole space in spherical coordinates). The tangent vectors to the surface are $\partial\mathbf{r}/\partial s$ and $\partial\mathbf{r}/\partial t$. For the whole space, you have a frame of vector fields $(\partial\mathbf{r}/\partial r, \partial\mathbf{r}/\partial \theta, \partial\mathbf{r}/\partial \phi)$ which are orthogonal at each point (this is what it means when we say that spherical coordinates are an orthogonal coordinate system), but they are not normalized. It is these un-normalized vectors that in differential geometry are referred to as $(\partial/\partial r, \partial/\partial \theta, \partial/\partial \phi)$. (In the abstract setting, the manifold is not embedded in a Euclidean space, so it doesn't make sense to talk about a position vector, and thus the $\mathbf{r}$ is omitted from the notation. Also, as you've probably seen, vectors are often defined as first order differential operators, and this notation conforms to that way of thinking.) You get $\hat{r}$ etc. by normalizing these vectors.
You can also determine the expression for $\nabla$ in other coordinate systems just by using the Jacobian.
Let $f(r) = r' = \rho e_1 + \varphi e_2 + z e_3$ be our nonlinear coordinate transformation, with $e_i$ being cartesian basis vectors.
Now, let $\phi'(r') = \phi(r)$ for some scalar field $\phi$. For any vector $a$, the chain rule then tells us that
$$a \cdot \nabla \phi = [(a \cdot \nabla )f] \cdot \nabla' \phi'$$
The quantity $(a \cdot \nabla)f$ is the Jacobian (hint: evaluate it with respect to basis vectors for $a$ and then to extract components of the resulting vector). We'll call the Jacobian $\underline f(a)$ when it acts on some vector $a$. The law above can be rewritten as
$$a \cdot \nabla \phi = \underline f(a) \cdot \nabla' \phi'$$
But, we can transpose the Jacobian (or more precisely, use the adjoint operator) to have it act on $\nabla'$ instead!
$$a \cdot \nabla \phi = a \cdot \overline f(\nabla') \phi' = a \cdot \overline f(\nabla') \phi$$
Or more succinctly,
$$\nabla = \overline f(\nabla')$$
The expression for $\nabla'$ is easy enough: it's $\nabla' = e^1 \partial_\rho + e^2 \partial_\varphi + e^3 \partial_z$. This is not enough, however. We actually want $\nabla$, just expressed in terms of the cylindrical coordinate partials. To get that, we must find $\overline f(\nabla')$.
Calculating the Jacobian (and its adjoint) is more of a tedious than complicated process. Let's just take as given that the adjoint Jacobian is
$$\begin{align*}\overline f(e^1) &= e^\rho \\ \overline f(e^2) &= e^\varphi \\ \overline f(e^3) &= e^3 = e^z\end{align*}$$
The vectors $e^\rho, e^\varphi, e^z$ form the basis covectors in cylindrical coordinates. They are not all normalized--in particular, $e^\varphi \cdot e^\varphi = 1/\rho^2$. (Actually carrying out the computation of the Jacobian generates these expressions in terms of cartesian basis covectors, which is instructive, but not really necessary. It is, however, one way you verify the norm of $e^\varphi$.) This makes $\nabla$ equal to
$$\nabla = e^\rho \partial_\rho + e^\varphi \partial_\varphi + e^z \partial_z$$
In this light, the gradient is actually pretty trivial because, as long as the new coordinate frame is orthogonal, you get a result like this. Maybe one of the basis covectors is non-unit, but that's not really a big deal. The divergence tends to be more interesting because you have to account for the transformation law for the underlying vector field (is it actually a vector field? is it instead a covector field?) and because the dot product requires you to transform $\nabla$ and the field separately.
Long story short: because you expressed $\nabla$ in terms of basis vectors instead of covectors, you got what looks like the wrong result but isn't. $e_\varphi$ is indeed equal to $e^\varphi \rho^2$, and neither has unit magnitude, as Jason points out.
This approach is the basic idea behind that of tetrads or frame fields. Note that the metric $\underline g(a) = \overline f^{-1} \underline f^{-1}(a)$, so everything you naturally do with the metric can be done with the Jacobian (or, in a case where the underlying space isn't flat, with the frame field) instead.
Best Answer
Differential geometry seldom users orthonormal bases the way vector calculus does. Your expression for the gradient to start with is in terms of an orthonormal basis, but the metric you used is incompatible with that; it uses the actual coordinate basis. Try writing the gradient in terms of the same basis that you use for the metric and try again.