For the gradient, your mistake is that the components of the gradient vary contravariantly. On top of that, there is a issue with normalisation that I discuss below. I don't know if you are familiar with differential geometry and how it works, but basically, when we write a vector as $v^\mu$ we really are writing its components with respect to a basis.
In differential geometry, vectors are entities which act on functions $f : M \rightarrow \mathbb{R}$ defined on the manifold. Tell me if you want me to elaborate, but this implies that the basis vectors given by some set of coordinates are $\partial_\mu = \frac{\partial}{\partial x^\mu}$ and vary covariantly. Let's name those basis vectors $e_\mu$ to go back to the "familiar" linear algebra notation.
Knowing that, any vector is an invariant which can be written as $\vec{V} = V^\mu \partial_\mu$. The key here is that it is invariant, so it will be the same no matter which coordinate basis you choose.
Now, the gradient is defined in Euclidean space simply as the vector with coordinates $\partial_i f = \partial^if$ where $i = \{x,y,z\}$. Note that in cartesian coordinates covariant and contravariant components are the same. So, the invariant quantity is $\vec{\nabla}f = \partial^ife_i$. Note that, from what we did before, the components of a vector are to be treated as contravariant.
Now, since this expression is invariant we get, in general coordinates $\vec{\nabla}f = \partial^\mu fe_\mu$. So what you are looking for when computing the components is $\partial^\mu f = g^{\mu\nu}\frac{\partial f}{\partial x^\nu}$. This gives $\vec{\nabla}f = \frac{\partial f}{\partial r} e_r + \frac{1}{r^2}\frac{\partial f}{\partial \theta} e_\theta +\frac{1}{r^2 \sin^2\theta}\frac{\partial f}{\partial \phi} e_\phi$. This is still not what we're looking for.
This is due to the fact that the basis vectors are not normalised. Indeed, take a specific vector $e_I$. His components are $\delta^\mu_I$ by definition (it's a basis vector). Then, $|e_I|^2 = e_I^\mu e_I^\nu g_{\mu\nu} = g_{II}$. Great then ! Using $e_i' = e_i/\sqrt{g_{ii}}$ as normalised base vectors, we get the right answer : $\vec{\nabla}f = \frac{\partial f}{\partial r} e_r' + \frac{1}{r}\frac{\partial f}{\partial \theta} e_\theta' +\frac{1}{r \sin\theta}\frac{\partial f}{\partial \phi} e_\phi'$
Let's move onto the divergence. It seems easier since it's a scalar, there's no basis vector to mess around with. Well, actually there are still some problem attached to that. Your formula is right, again, except that when you write the invariant formula $\nabla_\mu V^{\mu}$ you implicitly use the basis that we defined earlier. This means that you are not working on a normalised basis. We know that the $\vec{V}$ you used is $\vec{V} = V^\mu e_\mu = V^\mu \sqrt{g_{\mu\mu}}e_\mu'$. So to compare formula you have to introduce the vector with respect to the normalised coordinates, $A^\mu= V^\mu\sqrt{g_{\mu\mu}}$. I'll let you plug it in your (correct) formula to see that it works.
To conclude, your formula for the curl should be right. Just be careful to use the right normalisations for the vectors and you should be fine (also be careful of the tensorial form of the levi-civita tensor, which involves the determinant of the metric). I have not the faith to do the calculations for you, but you definitely should try it to ensure yourself you understood it well.
P.S: Just for completeness, for the divergence there is a quite useful formula which is also used in Sean Caroll book : $\vec{\nabla}\cdot\vec{V} = \frac{1}{\sqrt{g}}\partial_\mu(\sqrt{g}V^\mu)$, useful when you're too lazy to compute Christoffels.
First of all let's define dot product and cross product between two 3-vectors $$\mathbf{a} = \begin{pmatrix} a_1 \\ a_2 \\ a_3 \end{pmatrix} \qquad \text{and} \qquad \mathbf{b} = \begin{pmatrix} b_1 \\ b_2 \\ b_3 \end{pmatrix} $$
dot product:
$$ \mathbf{a}\cdot\mathbf{b} = \sum_i a_i b_i = a_1 b_1 + a_2 b_2+ a_3b_3 $$
cross product: $$ \mathbf{a}\times\mathbf{b} = \begin{pmatrix} a_2 b_3 - a_3 b_2 \\a_3 b_1 - a_1 b_3 \\a_1 b_2 - a_2 b_1 \end{pmatrix} $$
Note that these definitions do not involve geometric quantities like the angle between the two vectors; indeed, it is the angle that is defined in terms of the dot product (for the records, $\cos \theta := \mathbf{a\cdot b}/ \sqrt{\mathbf{(a\cdot a)(b\cdot b)}}$).
Then you have the definition of divergence and curl acting on a function $\mathbf{f}(\mathbf{x}) \equiv \begin{pmatrix}f_1(\mathbf{x}), f_2(\mathbf{x}), f_3(\mathbf{x})\end{pmatrix}$ ($\mathbf{x} = (x_1,x_2,x_3)$; you can call $x_1=x$, $x_2=y$ and $x_3=z$ but my choice allow a compact notation):
divergence:
$$
\mathrm{div}\, \mathbf{f} := \frac{\partial }{\partial x_1} f_1+\frac{\partial }{\partial x_2} f_2+\frac{\partial }{\partial x_3} f_3 = \sum_i \frac{\partial }{\partial x_i}f_i \equiv \sum_i {\partial_i}f_i
$$
where $\partial_i \equiv \partial / \partial x_i$.
curl:
$$
\mathrm{curl} \,\mathbf{f} := \begin{pmatrix}
\partial_2 f_3 - {\partial_3 f_2} \\
\partial_3 f_1 - \partial_1 f_3 \\
\partial_1 f_2 - \partial_2 F_1\end{pmatrix}
$$
Now you can see that if you introduce the quantity $$ \nabla = \begin{pmatrix} \partial_1 \\ \partial_2 \\ \partial_3 \end{pmatrix} $$
you can write the operations of divergence and curl as if $\nabla$ was a vector! Indeed if you apply the definition of dot and cross product you can easily find out that
$$
\nabla \cdot \mathbf{f} = \mathrm{div}\, \mathbf{f} \qquad \text{and} \qquad
\nabla \times \mathbf{f} = \mathrm{curl}\, \mathbf{f}
$$
You can find out that many identities holding for 3-vectors still hold id one of them is $\nabla$.
But note that this "trick" of thinking to $\nabla$ as a 3-vector is formal and not all identities holding for usual 3-vectors keep working.
Best Answer
Since your question is how to gain an intuitive understanding of what the two equations $\nabla \cdot \nabla \mathbf{F} = 0$ and $\nabla \times \nabla f=\mathbf{0}$ mean, I will discuss the equations in a simple setting. The simpler, more intuitive setting I am thinking of is a discrete setting: the honeycomb lattice, shown below.
$\hskip 2 in$
Let me explain how vector calculus works on this lattice. A scalar function takes on values at the discrete vertices of the lattice.
A vector field takes on values on the edges of the lattice. This makes sense because vectors should represent movement from one point to another.
The gradient of a function $f$ has, on each edge, a magnitude equal to the difference between $f$ on the two ends of the edge, and the direction of the gradient, of course, is from the lower value to the higher value.
Now when you take the curl you get a function defined on each face (that is, hexagon) of the lattice. This makes sense because the curl is supposed to tell you how the vector field is going around in a "circle". In this case, the "circle" is really a hexagon. Let's see an example of taking the curl of a vector field.
In the above figure, the values of the vector field $\mathbf{F}$ are shown in red. Here I have taken the counter-clockwise direction to be positive, so a negative value means the vector actually points in the clockwise direction. The curl can be found by adding the values as you move counter-clockwise along the hexagon. So the value of the curl at the hexagon shown in the figure is $4$.
Now lets see why the curl of the gradient has to be zero.
Above, I have shown the values of the function in black at each vertex. The gradient is shown in red at each edge as in the previous picture. We find in this case that the curl is zero. The reason is simple. When you are taking the curl in this case, you are adding up the differences in the function values as you move around the hexagon. But since you end up where you started, the sum of the changes must be zero.
Now let's think about why the divergence of the curl must be zero. To take the divergence, we must move to three dimensions. So let's think about the soccer ball shape shown below.
As before, functions are defined on vertices, vectors are defined on edges, the curl is defined on faces. The new thing is that the divergence is defined on the volume enclosed by the soccer ball.
To see why the divergence of the curl is zero, lets just look at what is going on in the two hexagons closest to us.
In this case, we have a vector field $\mathbf{F}$, which can take on arbitrary values on the edges. I have picked random values for the sake of illustration. Again, the curl defined on each hexagon is the sum of $\mathbf{F}$ as you go around the hexagon counterclockwise. The counter-clockwise direction is important here. Notice the edge where $\mathbf{F}=4$ (the one shown in a box) is included in the calculation of the curl at both hexagons. In the lower hexagon, it has a positive sign, since this $\mathbf{F}$ does point in the counterclockwise direction as you go around the lower hexagon. However, it has a negative sign in the upper hexagon, since there this $\mathbf{F}$ points in the clockwise direction. Now since the divergence is the sum the values over all the faces, and the edge's contribution to the lower face is the opposite of the contribution to the upper face, we see that this edge has zero contribution to the divergence. But every single edge on the soccer ball lies on two faces, so each edge gives zero contribution to the divergence, and thus the divergence must be zero.
Now I should probably say a few words about how this relates to the situation of normal euclidean space. The situation is really very analogous. You can measure the curl of a vector field by taking its line integral around small circles. However, in the case of a gradient, the line integral tells you the total change in the function as you go around the circle. Since you end where you begin, the total change must be zero. Since these integrals must all be zero for the gradient, the curl of a gradient must be zero.
The divergence can be measured by integrating the field that goes through a small sphere. If you have a non-zero vector on the surface, then it will tend to create an outward pointing curl on its left, but an inward pointing curl on its right. These two contributions to the divergence cancel out, so you wind up with zero divergence.