Here's what's really going on. In classical field theory, a basic set of objects that we often consider are scalar fields $\phi:M\to \mathbb R$ where $M$ is a manifold. Now we can ask ourselves the following question:
Is there some natural notion of how a scalar field defined on a given manifold "transforms" under a coordinate transformation?
I claim that the answer is yes, and I'll attempt to justify my claim both mathematically, and physically. The bottom line is that we ultimately have to define the way in which fields transform under certain types of transformations, but any old definition will not necessarily be useful in math or physics, so we must make well-motivated definitions and then show that they are useful for modeling physical systems.
Mathematical perspective. (manifolds and coordinate charts)
Recall that a coordinate system (aka coordinate chart) on an $n$-dimensional manfiold $M$ is a (sufficiently smooth) mapping $\psi:U\to \mathbb R^n$ where $U$ is some open subset of $M$. We can use such a coordinate system to define a coordinate representation $\phi_\psi$ of the scalar field $\phi$ as
\begin{align}
\phi_\psi = \phi\circ\psi^{-1}:V\to\mathbb R
\end{align}
where $V$ is the image of $U$ under $\psi$. Now let two coordinate systems $\psi:U_1\to \mathbb R^n$ and $\psi_2:U_2\to\mathbb R^n$ be given such that $U_1\cap U_2\neq \emptyset$. The coordinate representation of $\phi$ in these two coordinate systems is $\phi_1 = \phi\circ \psi_1^{-1}$ and $\phi_2 = \phi\circ \psi_2^{-1}$.
Now consider a point $x\in U_1\cap U_2$, then $x$ is mapped to some point $x_1\in \mathbb R^n$ under $\psi_1$ and to some point $x_2\in \mathbb R^n$ under $\psi_2$. We can therefore write
\begin{align}
\phi(x) &= \phi \circ \psi_1^{-1} \circ \psi_1(x) = \phi_1(x_1) \\
\phi(x) &= \phi \circ \psi_2^{-1} \circ \psi_2(x) = \phi_2(x_2)
\end{align}
so that
\begin{align}
\phi_1(x_1) = \phi_2(x_2)
\end{align}
In other words, the value of the coordinate representation $\phi_1$ evaluated at the coordinate representation $x_1 = \psi_1(x)$ of the point $x$ agrees with the value of the coordinate representation $\phi_2$ evaluated at the coordinate representation $x_2 = \psi_2(x)$ of the same point $x$. This is one way of understanding what it means for a scalar field to be "invariant" under a change of coordinates.
If, in particular, the manifold $M$ we are considering is $\mathbb R^{3,1} = (\mathbb R^4, \eta)$, namely four-dimensional Minkowski space, then we could consider the following two coordinate systems:
\begin{align}
\psi_1(x) &= x \\
\psi_2(x) &= \Lambda x+a
\end{align}
where $\Lambda$ is a Lorentz transformation and $a\in \mathbb R^4$, then the coordinate representations $\phi_1$ and $\phi_2$ of $\phi$ are, as noted above, related by
\begin{align}
\phi_1(x) = \phi_2(\Lambda x + a)
\end{align}
If we switch notation a bit and write $\phi_1 = \phi$ and $\phi_2 = \phi'$, then this reads
\begin{align}
\phi'(\Lambda x +a) = \phi(x)
\end{align}
which is the standard expression you'll see in field theory texts.
Physical perspective.
Here's a lower-dimensional analogy. Imagine a temperature field $T:\mathbb R^2 \to \mathbb R$ on the plane that assigns a real number that we interpret as the temperature at each point on some two-dimensional surface. Suppose that this temperature field is generated by some apparatus under the surface, and suppose that we translate the apparatus by a vector $\vec a$. We could now ask ourselves:
What will the temperature field produced by the translated apparatus look like? Well, each point in the temperature distribution will be translated by the amount $\vec a$. So, for example, if the point $\vec x_0$ has temperature $T(\vec x_0) = 113^\circ\,\mathrm K$, then after the apparatus is translated, the point $\vec x_0 + \vec a$ will have the same temperature $113^\circ\,\mathrm K$ as the point $\vec x_0$ before the apparatus was translated. The mathematical way of writing this is that if $T'$ denotes the translated temperature field, then $T'$ is related to $T$ by
\begin{align}
T'(\vec x+\vec a) = T(\vec x)
\end{align}
The a similar argument could be made for a scalar field on Minkowski space, but instead of simply translating some temperature apparatus, we could imagine boosting or translating something producing some Lorentz scalar field, and we would be motivated to define the transformation law of a scalar field under Poincare transformation as
\begin{align}
\phi'(\Lambda x+a) = \phi(x)
\end{align}
For the gradient, your mistake is that the components of the gradient vary contravariantly. On top of that, there is a issue with normalisation that I discuss below. I don't know if you are familiar with differential geometry and how it works, but basically, when we write a vector as $v^\mu$ we really are writing its components with respect to a basis.
In differential geometry, vectors are entities which act on functions $f : M \rightarrow \mathbb{R}$ defined on the manifold. Tell me if you want me to elaborate, but this implies that the basis vectors given by some set of coordinates are $\partial_\mu = \frac{\partial}{\partial x^\mu}$ and vary covariantly. Let's name those basis vectors $e_\mu$ to go back to the "familiar" linear algebra notation.
Knowing that, any vector is an invariant which can be written as $\vec{V} = V^\mu \partial_\mu$. The key here is that it is invariant, so it will be the same no matter which coordinate basis you choose.
Now, the gradient is defined in Euclidean space simply as the vector with coordinates $\partial_i f = \partial^if$ where $i = \{x,y,z\}$. Note that in cartesian coordinates covariant and contravariant components are the same. So, the invariant quantity is $\vec{\nabla}f = \partial^ife_i$. Note that, from what we did before, the components of a vector are to be treated as contravariant.
Now, since this expression is invariant we get, in general coordinates $\vec{\nabla}f = \partial^\mu fe_\mu$. So what you are looking for when computing the components is $\partial^\mu f = g^{\mu\nu}\frac{\partial f}{\partial x^\nu}$. This gives $\vec{\nabla}f = \frac{\partial f}{\partial r} e_r + \frac{1}{r^2}\frac{\partial f}{\partial \theta} e_\theta +\frac{1}{r^2 \sin^2\theta}\frac{\partial f}{\partial \phi} e_\phi$. This is still not what we're looking for.
This is due to the fact that the basis vectors are not normalised. Indeed, take a specific vector $e_I$. His components are $\delta^\mu_I$ by definition (it's a basis vector). Then, $|e_I|^2 = e_I^\mu e_I^\nu g_{\mu\nu} = g_{II}$. Great then ! Using $e_i' = e_i/\sqrt{g_{ii}}$ as normalised base vectors, we get the right answer : $\vec{\nabla}f = \frac{\partial f}{\partial r} e_r' + \frac{1}{r}\frac{\partial f}{\partial \theta} e_\theta' +\frac{1}{r \sin\theta}\frac{\partial f}{\partial \phi} e_\phi'$
Let's move onto the divergence. It seems easier since it's a scalar, there's no basis vector to mess around with. Well, actually there are still some problem attached to that. Your formula is right, again, except that when you write the invariant formula $\nabla_\mu V^{\mu}$ you implicitly use the basis that we defined earlier. This means that you are not working on a normalised basis. We know that the $\vec{V}$ you used is $\vec{V} = V^\mu e_\mu = V^\mu \sqrt{g_{\mu\mu}}e_\mu'$. So to compare formula you have to introduce the vector with respect to the normalised coordinates, $A^\mu= V^\mu\sqrt{g_{\mu\mu}}$. I'll let you plug it in your (correct) formula to see that it works.
To conclude, your formula for the curl should be right. Just be careful to use the right normalisations for the vectors and you should be fine (also be careful of the tensorial form of the levi-civita tensor, which involves the determinant of the metric). I have not the faith to do the calculations for you, but you definitely should try it to ensure yourself you understood it well.
P.S: Just for completeness, for the divergence there is a quite useful formula which is also used in Sean Caroll book : $\vec{\nabla}\cdot\vec{V} = \frac{1}{\sqrt{g}}\partial_\mu(\sqrt{g}V^\mu)$, useful when you're too lazy to compute Christoffels.
Best Answer
This is actually simple. Since as you point out $X = X^\lambda \partial_\lambda$ and $Y=Y^\lambda\partial_\lambda$ as vector fields (not their components) are themselves invariant, we can evaluate the transformation of the commutator as follows
$$[X,Y]^{\mu'}= X(Y^{\mu'})-Y(X^{\mu'})=X^\lambda \partial_\lambda Y^{\mu'}-Y^\lambda \partial_\lambda X^{\mu'}.$$
Then we just write $X^{\mu'}$ and $Y^{\mu'}$ in terms of $X^\mu$ and $Y^\mu$ in the original unprimed coordinates
$$[X,Y]^{\mu'}=X^\lambda \partial_\lambda \left(\dfrac{\partial x'^\mu}{\partial x^\nu}Y^\nu\right)-Y^\lambda \partial_\lambda \left(\dfrac{\partial x'^\mu}{\partial x^\nu}X^\nu\right).$$
Now use the product rule to get
\begin{eqnarray} [X,Y]^{\mu'}=\dfrac{\partial x'^\mu}{\partial x^\nu}[X,Y]^\nu+(X^\lambda Y^\nu-X^\nu Y^\lambda) \dfrac{\partial^2 x'^\mu}{\partial x^\nu \partial x^\lambda} .\end{eqnarray}
The second term is then noted to be the non-tensorial part. Observe, though, that it vanishes because $\frac{\partial^2 x'^\mu}{\partial x^\nu \partial x^\lambda}$ is symmetric while $X^{\lambda}Y^\nu - X^\nu Y^\lambda$ is anti-symmetric under the exchange $\lambda \leftrightarrow \nu$. As a result
\begin{eqnarray} [X,Y]^{\mu'}=\dfrac{\partial x'^\mu}{\partial x^\nu}[X,Y]^{\nu}.\end{eqnarray}
One can also prove that $[X,Y]$ defines a vector field without resorting to coordinates and this is quite instructive. The key thing about being a vector field is that $[X,Y]$ should obey the Liebnitz rule when acting on functions, i.e., we need
$$[X,Y](fg)=g[X,Y]f+f[X,Y]g.$$
This is a more abstract definition of a vector field: a vector field is a derivation acting on smooth functions. Using this definition it is straightforward to prove $[X,Y]$ is a vector field:
\begin{eqnarray}[X,Y](fg)&=&X(Y(fg))-Y(X(fg))\\&=&X(Y(f)g+fY(g))-Y(X(f)g+fX(g))\\ &=& [X(Y(f))g+Y(f)X(g)+X(f)Y(g)+fX(Y(g))]\\ &&-[Y(X(f))g+X(f)Y(g)+Y(f)X(g)+fY(X(g))]\\ &=& X(Y(f))g+ fX(Y(g))- Y(X(f))g-f Y(X(g))\\ &=& g[X,Y]f+f[X,Y]g.\end{eqnarray}