The definition of the gradient $\nabla f$ of a function $f: M\to \mathbb{R}$ for $(M, g)$ a Riemannian manifold is given by $$g_x(\nabla f, V) = \mathrm{d}_x f(V)$$ at all $x\in M$ and vector fields $V$ on $M$. To determine the gradient on the unit 2-sphere $S$, we take a point $p = (\varphi, \theta)$, which has cartesian representation $(\cos(\varphi)\sin(\theta), \sin(\varphi)\sin(\theta), \cos(\theta))$, for $\varphi\in [0, 2\pi)$, $\theta\in [0, \pi)$. Then, for two vectors $v_1 = (\varphi_1', \theta_1')$ and $v_2 = (\varphi_2', \theta_2')$ in $T_pS$, we have that the inner product between $v_1$ and $v_2$ is $$g_p(v_1, v_2) = \sin^2(\theta)\varphi_1'\varphi_2'+\theta_1'\theta_2'$$ To see this, we can take the inner product of their cartesian representations at $p$: $v_1$ has the representation $$(\theta_1'\cos(\varphi)\cos(\theta)-\varphi_1'\sin(\varphi)\sin(\theta), \theta_1'\sin(\varphi)\cos(\theta)+\varphi_1'\cos(\varphi)\sin(\theta), -\theta_1'\sin(\theta))$$ and $v_2$ has a similar representation, with $\varphi_2'$ and $\theta_2'$ in place of $\varphi_1'$ and $\theta_1'$. Then, let's say that at $x$, $V$ has the direction $v_1$, so $$\mathrm{d}_x f(V) = \varphi_1'\partial_{\varphi} f(\varphi, \theta)+\theta_1'\partial_{\theta} f(\varphi, \theta)$$ This then implies that $$\nabla f(\varphi, \theta) = (\partial_{\varphi} f(\varphi, \theta)/\sin^2(\theta), \partial_{\theta} f(\varphi, \theta))$$ so that $$g_p(\nabla f(p), v_1) = \sin^2(\theta)\varphi_1'\cdot \partial_{\varphi} f(\varphi, \theta)/\sin^2(\theta)+\theta_1'\cdot \partial_{\theta} f(\varphi, \theta) = \mathrm{d}_x f(V)$$ But, it seems like this has an extra factor of $\sin(\theta)$ in the denominator in the $\varphi$ component. Where did I go wrong? Thanks for your help!
Deriving gradient on unit 2-sphere using definition for Riemannian manifolds
differential-geometrymultivariable-calculussolution-verificationvector analysis
Related Solutions
You can also determine the expression for $\nabla$ in other coordinate systems just by using the Jacobian.
Let $f(r) = r' = \rho e_1 + \varphi e_2 + z e_3$ be our nonlinear coordinate transformation, with $e_i$ being cartesian basis vectors.
Now, let $\phi'(r') = \phi(r)$ for some scalar field $\phi$. For any vector $a$, the chain rule then tells us that
$$a \cdot \nabla \phi = [(a \cdot \nabla )f] \cdot \nabla' \phi'$$
The quantity $(a \cdot \nabla)f$ is the Jacobian (hint: evaluate it with respect to basis vectors for $a$ and then to extract components of the resulting vector). We'll call the Jacobian $\underline f(a)$ when it acts on some vector $a$. The law above can be rewritten as
$$a \cdot \nabla \phi = \underline f(a) \cdot \nabla' \phi'$$
But, we can transpose the Jacobian (or more precisely, use the adjoint operator) to have it act on $\nabla'$ instead!
$$a \cdot \nabla \phi = a \cdot \overline f(\nabla') \phi' = a \cdot \overline f(\nabla') \phi$$
Or more succinctly,
$$\nabla = \overline f(\nabla')$$
The expression for $\nabla'$ is easy enough: it's $\nabla' = e^1 \partial_\rho + e^2 \partial_\varphi + e^3 \partial_z$. This is not enough, however. We actually want $\nabla$, just expressed in terms of the cylindrical coordinate partials. To get that, we must find $\overline f(\nabla')$.
Calculating the Jacobian (and its adjoint) is more of a tedious than complicated process. Let's just take as given that the adjoint Jacobian is
$$\begin{align*}\overline f(e^1) &= e^\rho \\ \overline f(e^2) &= e^\varphi \\ \overline f(e^3) &= e^3 = e^z\end{align*}$$ The vectors $e^\rho, e^\varphi, e^z$ form the basis covectors in cylindrical coordinates. They are not all normalized--in particular, $e^\varphi \cdot e^\varphi = 1/\rho^2$. (Actually carrying out the computation of the Jacobian generates these expressions in terms of cartesian basis covectors, which is instructive, but not really necessary. It is, however, one way you verify the norm of $e^\varphi$.) This makes $\nabla$ equal to
$$\nabla = e^\rho \partial_\rho + e^\varphi \partial_\varphi + e^z \partial_z$$
In this light, the gradient is actually pretty trivial because, as long as the new coordinate frame is orthogonal, you get a result like this. Maybe one of the basis covectors is non-unit, but that's not really a big deal. The divergence tends to be more interesting because you have to account for the transformation law for the underlying vector field (is it actually a vector field? is it instead a covector field?) and because the dot product requires you to transform $\nabla$ and the field separately.
Long story short: because you expressed $\nabla$ in terms of basis vectors instead of covectors, you got what looks like the wrong result but isn't. $e_\varphi$ is indeed equal to $e^\varphi \rho^2$, and neither has unit magnitude, as Jason points out.
This approach is the basic idea behind that of tetrads or frame fields. Note that the metric $\underline g(a) = \overline f^{-1} \underline f^{-1}(a)$, so everything you naturally do with the metric can be done with the Jacobian (or, in a case where the underlying space isn't flat, with the frame field) instead.
Rigorous formulation
This involves the definition of the surface gradient operator, which is defined as $$ \nabla_{\Gamma} = \nabla - {\mathbf e}_{r} \left({\mathbf e}_{r} \cdot \nabla\right). $$
The projection of the gradient along the unit normal ${\mathbf e}_{r}$ is evaluated by $$ {\mathbf e}_{r} \left({\mathbf e}_{r} \cdot \nabla f\right) = {\mathbf e}_{r} \left(\frac{\partial f}{\partial r} {\mathbf e}_{r}^2\right) = {\mathbf e}_{r} \frac{\partial f}{\partial r} $$ which when subtracted from $\nabla f$ gives (at $r=R$) $$ {\nabla}_{\Gamma} f = \mathbf{e}_{\varphi} \frac{1}{R\sin\theta}\frac{\partial f}{\partial\varphi} + \mathbf{e}_{\theta} \frac{1}{R} \frac{\partial f}{\partial\theta}. $$
Angular gradient field
Because of $\mathbf{e}_{\theta}\cdot\mathbf{e}_{\varphi} = 0$ and that the longitudinal rotation and the latitudinal rotations are independent of each other, the component of ${\nabla_{\Gamma}}f$ in the direction of $\mathbf{e}_{\theta}$ and $\mathbf{e}_{\varphi}$ can be individually resolved into angular components.
Polar regions
The original extent for asking the case of $\sin\theta=0$ is for numerical purposes. However, it should be noted that the gradient operator cannot be defined on $\sin\theta=0$ due to the choice of coordinate system and boundary conditions.
Best Answer
I think I realized myself that there actually isn't a mistake at all. If we want to choose an orthonormal basis aligned with our $(\varphi, \theta)$ coordinates at $T_pS$, then we need to pick $\hat{\varphi} = (1/\sin(\theta), 0)$ and $\hat{\theta} = (0, 1)$. Then, $$g_p(\nabla f(p), v_1) = \partial_{\varphi} f(\varphi, \theta)/\sin(\theta)\hat{\varphi}+\partial_{\theta} f(\varphi, \theta)\hat{\theta}$$ which is the formula given on Wikipedia.