$
\newcommand\PD[2]{\frac{\partial#1}{\partial#2}}
\newcommand\tPD[2]{\partial#1/\partial#2}
\newcommand\dd{\mathrm d}
\newcommand\R{\mathbb R}
\newcommand\diff\underline
\newcommand\adj\overline
\DeclareMathOperator\Tr{Tr}
\newcommand\Hom{\mathrm{Hom}}
$
Everything Must be Differentiated
It is applicable to any coordinate system. What you're missing is that $\hat r, \hat\theta, \hat\phi$ are not constant, and vary throughout space. This is also the reason I would not choose to write
$$
\nabla = \PD{}r\hat r + \frac1r\PD{}\theta\hat\theta + \frac1{r\sin\theta}\PD{}\phi\vec\phi
$$
and would instead write
$$
\nabla = \hat r\PD{}r + \frac1r\hat\theta\PD{}\theta + \frac1{r\sin\theta}\vec\phi\PD{}\phi.
$$
What we should write when evaluating $\nabla\cdot\vec F$ is
$$\begin{aligned}
\nabla\cdot\vec F
&= \left(\hat r\PD{}r + \frac1r\hat\theta\PD{}\theta + \frac1{r\sin\theta}\vec\phi\PD{}\phi\right)\cdot(F_r\hat r + F_\theta\hat\theta + F_\phi\hat\phi)
\\
&=\begin{aligned}[t]
&\hat r\cdot\PD{}r(F_r\hat r) + \vec r\cdot\PD{}r(F_\theta\hat\theta) + \vec r\cdot\PD{}r(F_\phi\hat\phi)
\\
&+ \frac1r\hat\theta\cdot\PD{}\theta(F_r\hat r) + \frac1r\hat\theta\cdot\PD{}\theta(F_\theta\hat\theta) + \frac1r\hat\theta\cdot\PD{}\theta(F_\phi\hat\phi)
\\
&+ \frac1{r\sin\theta}\hat\phi\cdot\PD{}\phi(F_r\hat r) + \frac1{r\sin\theta}\hat\phi\cdot\PD{}\phi(F_\theta\hat\theta) + \frac1{r\sin\theta}\hat\phi\cdot\PD{}\phi(F_\phi\hat\phi).
\end{aligned}
\end{aligned}$$
Then we apply the product rule, and use the following facts:
- $\hat r, \hat\theta, \hat\phi$ are independent of $r$.
- $\partial/\partial\theta$ and $\partial/\partial\phi$ applied to $\hat v = \hat r, \hat\theta, \hat\phi$ produce something orthogonal to $\hat v$.
- Of course, $\hat r, \hat\theta, \hat\phi$ form an orthogonal basis.
From this we get
$$\begin{aligned}
&\PD{F_r}r
\\
&+ \frac{F_r}r\hat\theta\cdot\PD{\hat r}\theta + \frac1r\PD{F_\theta}\theta + \frac{F_\phi}r\hat\theta\cdot\PD{\hat\phi}\theta
\\
&+ \frac{F_r}{r\sin\theta}\hat\phi\cdot\PD{\hat r}\phi + \frac{F_\theta}{r\sin\theta}\hat\phi\cdot\PD{\hat\theta}\phi + \frac1{r\sin\theta}\PD{F_\phi}\phi.
\end{aligned}$$
Now recall that
$$\begin{aligned}
\hat r &= (\sin\theta)(\cos\phi)\hat x + (\sin\theta)(\sin\phi)\hat y + (\cos\theta)\hat z,
\\
\PD{\hat r}\theta = \hat\theta &= (\cos\theta)(\cos\phi)\hat x + (\cos\theta)(\sin\phi)\hat y - (\sin\theta)\hat z,
\\
\PD{\hat r}\phi = (\sin\theta)\hat\phi &= -(\sin\theta)(\sin\phi)\hat x + (\sin\theta)(\cos\phi)\hat y.
\end{aligned}$$
This lets us evaluate the remaining dot products to get
$$\begin{aligned}
&\PD{F_r}r
\\
&+ \frac{F_r}r + \frac1r\PD{F_\theta}\theta
\\
&+ \frac{F_r}r + \frac{F_\theta\cot\theta}r + \frac1{r\sin\theta}\PD{F_\phi}\phi,
\end{aligned}$$
or written in one line
$$
\nabla\cdot\vec F = \PD{F_r}r + 2\frac{F_r}r + \frac1r\PD{F_\theta}\theta + \frac{F_\theta\cot\theta}r + \frac1{r\sin\theta}\PD{F_\phi}\phi.
$$
If you compare to the standard expression
$$
\nabla\cdot\vec F = \frac1{r^2}\PD{}r(r^2F_r) + \frac1{r\sin\theta}\PD{}\theta(F_\theta\sin\theta) + \frac1{r\sin\theta}\PD{F_\phi}\phi,
$$
you will find that they are equal.
The Integral Formulation of the Divergence
Edit, to hopefully to answer 5Pack's question in the comments:
Here is another perspective. Without trying to be too rigorous, we can define the divergence in a geometric, coordinate-free way as
$$
\nabla\cdot\vec F(\vec p) = \lim_{R_{\vec p}\to0}\frac1{|R_{\vec p}|}\oint_{\partial R_{\vec p}}\hat n(\vec q)\cdot\vec F(\vec q)\,\dd S.
\tag{$*$}
$$
The limit is taken such that the region of space $R_{\vec p}$ which contains the point $\vec p$ is shrunk around $\vec p$ to zero volume; $|R_{\vec p}|$ is the volume of the region. The integral is taken over the boundary $\partial R_{\vec p}$ of this region; $\vec q$ is the variable of integration, $\hat n(\vec q)$ is the outward-pointing unit normal to $\partial R_{\vec p}$ at $\vec q$, and $\dd S$ is the surface area element. To be clear, in Cartesian coordinates
$$
\dd S = \dd x\,\dd y\,\dd z,\quad
\vec q = x\hat x + y\hat y + z\hat z,
$$
and in spherical coordinates
$$
\dd S = r^2(\sin\theta)\,\dd r\,\dd\theta\,\dd\phi,
$$$$
\vec q = r\hat r(r, \theta, \phi) = r(\sin\theta)(\cos\phi)\hat x + r(\sin\theta)(\sin\phi)\hat y + r(\cos\theta)\hat z.
$$
To interpret ($*$), think of the case where $R_{\vec p}$ is an infinitesimal sphere centered at $\vec p$. We may write
$$
|R_{\vec p}| = (\epsilon/3)(4\pi\epsilon^2) = \frac\epsilon3\sigma
$$
where $\epsilon$ is infinitesimal and $\sigma$ is the surface area of $R_{\vec p}$. We write
$$
\nabla\cdot\vec F \approx \frac3{\epsilon}\frac1\sigma\oint_{\partial R_{\vec p}}\hat n(\vec q)\cdot\vec F(\vec q)\,\dd S.
\tag{$**$}
$$
$\hat n$ is a unit vector pointing away from $\vec p$; so $\hat n\cdot\vec F$ is how much $\vec F$ is pointing away from $\vec p$.
The integral $\frac1\sigma\oint_{\partial R_{\vec p}}\dd S$ is the average over $\partial R_{\vec p}$; hence the whole integral in ($**$) is the average amount that $\vec F$ is pointing away from the point $\vec p$. This average is taken over a sphere of radius $\epsilon$, and we divide by $\epsilon$ in analogy with 1D derivatives. (We can actually apply an analogue of ($*$) in 1D space, and recover the usual derivative. Two points bound a line, and form a 0D sphere!) The factor of $3$ is a result of doing all this in 3D space.
Hopefully this convinces you that ($*$) is a reasonable representation of the divergence. But now consider what it would mean to use the method you suggest, where
$$
\nabla\cdot\vec F
\overset ?= \PD{F_r}r + \frac1r\PD{F_\theta}\theta + \frac1{r\sin\theta}\PD{F_\phi}\phi.
$$
This would be the same as saying that
$$\begin{aligned}
\nabla\cdot\vec F(\vec p)
&= \lim_{R_{\vec p}\to0}\frac1{|R_{\vec p}|}\oint_{\partial R_{\vec p}}\hat n(\vec q)\cdot\vec F(\vec q)\,\dd S
\\
&= \lim_{R_{\vec p}\to0}\frac1{|R_{\vec p}|}\oint_{\partial R_{\vec p}}
\hat n(\vec q)\cdot\left(
F_r(\vec q)\hat r(\vec q) + F_\theta(\vec q)\hat\theta(\vec q) + F_\phi(\vec q)\hat\phi(\vec q)
\right)\dd S
\\
&\overset?= \lim_{R_{\vec p}\to0}\frac1{|R_{\vec p}|}
\begin{aligned}[t]\Biggl(
&\hat r(\vec p)\cdot\oint_{\partial R_{\vec p}}\hat n(\vec q)F_r(\vec q)\,\dd S
\\
&+ \hat\theta(\vec p)\cdot\oint_{\partial R_{\vec p}}\hat n(\vec q)F_\theta(\vec q)\,\dd S
\\
&+ \hat\phi(\vec p)\oint_{\partial R_{\vec p}}\hat n(\vec q)F_\phi(\vec q)\,\dd S
\Biggr).\end{aligned}
\end{aligned}$$
Hopefully, though, it's obvious that this is nonsensical; why were we allowed to magically pull $\hat r(\vec q), \hat\theta(\vec q), \hat\phi(\vec q)$ out of the integral, and treat them like they are constant?
That is the moral of the story: $\hat r, \hat\theta, \hat\phi$ are not constant, so we cannot treat them as such. $\nabla\cdot\vec F$ is computing the variation of $\vec F$; it doesn't care how $\vec F$ is expressed. When we write $\vec F = F_r\hat r + F_\theta\hat\theta + F_\phi\hat\phi$, then we're putting some of that variation into $\hat r, \hat\theta, \hat\phi$, and we can't just ignore it.
What Is $\nabla$ Anyway?
Edit 2: This is my response to 5Pack's question about why an expression like $\nabla\cdot\vec F$ even makes sense.
I will try to present the interesting points without being too long winded.
There are other questions where a full in-detail answer is warranted,
and once I answer one of those I will link to it here.
For simplicity I will only consider real vector spaces and the standard inner product,
but these constraints are not necessary.
There are two directions we could take this.
One way, we can extend the integral definition above.
This has a geometrically nice interpretation as the average variation of an expression around a point,
but it is hard to prove things and the integration machinery is unnecessary.
What we would like is a purely algebraic (but equivalent!) way of defining such expressions.
Coordinate Expression
We could fiddle around with coordinates.
If $\{x^i\}_{i=1}^m$ are coordinates of $\R^m$,
then there is a function $p(x^1, x^2, \dotsc, x^m)$ taking coordinates to points;
This gives us a basis $\{e_i\}_{i=1}^m$ of $\R^m$ via $e_i = \tPD p{x^i}$,
and then its reciprocal $\{e^i\}_{i=1}^m$ is the unique basis such that $e^i\cdot e_j = \delta^i_j$.
Now let $V$ be a vector space.
Then for any $L_x : \R^m \to V$ which is linear for each $x \in \R^m$, i.e. $L_x(v)$ is linear in $v$, we define
$$
L_x(\nabla) = \sum_{i=1}^m \PD{L_{p(x^1,\dotsc,x^m)}(e_i)}{x^i}.
$$
We would then go through the tedious task of confirming that this definition is independent of the coordinates chosen.
But I don't like coordinates.
Without Coordinates
First, for any function $f : \R^m \to V$ into some normed vector space $V$,
the total differential $\diff f(x; {-}) : \R^m \to V$ of $f$ at $x \in \R^m$
is the linear function which best approximates $f$ at $x$.
This captures all notions of the variation of $f$,
and for any finite-dimensional $V$ is uniquely defined.
Consider again linear functions $L_x : \R^m \to V$;
if $\Hom(\R^m, V)$ is the set of linear functions $\R^m \to V$,
then we can think of $L$ as a function $\R^m \to \Hom(\R^m, V)$
and so $\diff L(x; v) \in \Hom(\R^m, V)$ for any $v \in \R^m$.
If $V$ is finite dimensional, then so is $\Hom(\R^m, V)$ and the differential is uniquely defined;
we will be assuming this is the case.
We then define the derivative of $L$ as
$$
L_x(\nabla) = \Tr_{v\cdot w}\diff L(x; v)(w)
\tag{Deriv}
$$
where $\Tr_{v\cdot w}$ is indicating a metric contraction
of the tensor $\diff L(x; {-})({-}) : \R^m\times\R^m \to V$.
Note that $L_x(\nabla) \in V$,
so in this sense $L_x$ takes $\nabla$ to $L_x(\nabla)$
just like it takes $v \in \R^m$ to $L_x(v)$.
It is easy to verify that
$$
(aL_x + bM_x)(\nabla) = aL_x(\nabla) + bM_x(\nabla),\quad a, b \in \R.
$$
The way this works then is by defining an appropriate $L$:
- If $f : \R^m \to \R$, then the gradient of $f$ is the derivative of $L_x(v) = vf(x)$.
- If $F : \R^m \to \R^m$, then the divergence of $f$ is the derivative of $L_x(v) = v\cdot F(x)$.
- If $F : \R^3 \to \R^3$, then the curl of $F$ is the derivative of $L_x(v) = v\times F(x)$.
- We can also have some things which are not expressible in standard notation.
If $L_x(v) = F(x)\cdot(v\times G(x))$,
then $L_x(\nabla)$ is not $F(x)\cdot(\nabla\times G(x))$;
$L_x(\nabla)$ is differentiating both $F(x)$ and $G(x)$.
- It is easy to show that an expression like $L_x(\nabla, \nabla)$ for $L_x$ bilinear
is independent of the order in which each $\nabla$ is applied
(so long as partial derivatives commute,
or equivalently $\diff{\diff L}(x; {-}, {-})$ is symmetric).
This lets us define e.g. the Laplacian of $f(x)$ via $L_x(v, w) = (v\cdot w)f(x)$;
notice that $f$ can be valued in any vector space.
Motivation
The definition (Deriv) should be unsatisfying;
we basically just wrote the coordinate definition in coordinate-free language.
How can we justify it?
We make one assumption for $F : \R^m \to \R^m$:
$$
\nabla\cdot F(x) = \Tr_v\diff F(x; v),
$$
i.e. that $L_x(\nabla)$ for $L_x(v) = v\cdot F(x)$ is the trace of the differential of $F$.
Now, the inner product induces an isomorphism $\flat : \R^m \to (\R^m)^*$
via $v^\flat(w) = v\cdot w$, where $(\R^m)^* = \Hom(\R^m, \R)$ is the dual space of $\R^m$.
We denote $\sharp = \flat^{-1}$.
If $L_x : \R^m \to \R$ then we argue
$$
L_x(\nabla) = L_x^\sharp\cdot\nabla = \Tr_u\diff L(x; u)^\sharp = \Tr_{u\cdot v}\diff L(x; u)(v).
$$
Thus when $L_x : \R^m \to V$ and $\alpha \in V^*$ we argue
$$
\alpha(L_x(\nabla))
= (\alpha\circ L_x)(\nabla)
= \Tr_{u\cdot v}\diff{\alpha\circ L_x}(x; u)(v)
= \Tr_{u\cdot v}\alpha\bigl(\diff L(x; u)(v)\bigr).
$$
Contracting the tensor $(w, \alpha) \mapsto w\alpha(L_x(\nabla))$ for $w \in V$ finally gives
$$\begin{aligned}
L_x(\nabla)
&= \Tr_{\alpha(w)}w\Tr_{u\cdot v}\alpha(\diff L(x; u)(v))
\\
&= \Tr_{u\cdot v}\Tr_{\alpha(w)}w\alpha(\diff L(x; u)(v))
\\
&= \Tr_{u\cdot v}\diff L(x; u)(v)
\end{aligned}$$
as desired.
The Chain Rule
Let $F : \R^m \to \R^n$.
The chain rule takes the following form:
$$
L_{F(x)}(\nabla)
= L_{F(x)}\Bigl(\adj F(x; \nabla_F)\Bigr).
$$
$\adj F(x; {-})$ is the adjoint of $\diff F(x; {-})$ under the inner products on $\R^m$ and $\R^n$.
To be clear, what we mean by the RHS
is the derivative of $y \mapsto L_y(\adj F(x; {-}))$ evaluated at $y = F(x)$.
This chain rule can be expressed by the mnemonic
$$
\nabla_x = \adj F(x; \nabla_F).
$$
The Subexpression Rule
Consider $\R^{m+n} \cong \R^m\oplus\R^n$ and $L_{x\oplus y} : \R^m\oplus\R^n \to V$.
Then
$$
L_{x\oplus x}(\nabla) = L_{\dot x\oplus x}(\dot\nabla) + L_{x\oplus\dot x}(\dot \nabla),
$$
where by e.g. $L_{\dot x\oplus x}(\dot\nabla)$
we mean the derivative of $x \mapsto L_{x\oplus y}$ evaluated at $y = x$;
in other words, the undotted $x$ should be held constant when differentiating.
We can summarize this "subexpression rule" as follows:
the derivative of an expression is the sum of the derivatives of its subexpressions.
We will demonstrate this rule in the next section.
Vectorial Manipulation
It is a simple fact that if $L_x(v) = M_x(v)$ then $L_x(\nabla) = M_x(\nabla)$;
moreover, if $L_x$ and $M_x$ are multilinear
then $L_x(\nabla,\dotsc,\nabla) = M_x(\nabla,\dotsc,\nabla)$.
This means that so long as an expression involving $\nabla$ can be uniquely linearized,
then the use of $\nabla$ is valid and it can be manipulated as a vector,
so long as every instance of $x$ is differentiated.
(In this sense, the Laplacian $\nabla^2$ is not well-defined over fields of characteristic 2;
in characteristic not 2, there is a (unique) symmetric bilinear form $v\cdot w$ such that $v\cdot v = |v|^2$,
but in characteristic 2 this is not true.)
Some examples would be prudent.
Consider now a vector identity like (in $\R^3$)
$$
a\times(b\times c) = (a\cdot c)b - (a\cdot b)c.
$$
If $L(a, b, c)$ is the LHS and $M(a, b, c)$ the RHS,
then it is automatically true that e.g.
$L(\nabla, F(x), c) = M(\nabla, F(x), c)$ since $L = M$.
(To be clear, here we mean e.g. the derivative of $x \mapsto L({-}, F(x), c)$.)
Written in standard notation, this looks like
$$
\nabla\times(F(x)\times c) = (c\cdot\nabla)F(x) - (\nabla\cdot F(x))c.
$$
This shows how many people go wrong with manipulating $\nabla$ expressions;
because of the usual convention of "differentiate directly to the right",
we had to flip the first term around.
But this convention becomes unusable in a situation like $L(\nabla, F(x), G(x)) = M(\nabla, F(x), G(x)$;
this is still true since $L = M$.
But the following is false in conventional notation:
$$
\nabla\times(F(x)\times G(x)) \not= (\nabla\cdot G(x))F(x) - (\nabla\cdot F(x))G(x)
$$
since we need to differentiate both $F$ and $G$ in each expression,
not just one or the other as the conventional notation suggests.
What we might write to make this clear is
$$
\nabla\times(F(x)\times G(x)) = (\dot\nabla\cdot G(\dot x))F(\dot x) - (\dot\nabla\cdot F(\dot x))G(\dot x).
$$
To write this in conventional notation,
we have to use the subexpression rule to differentiate one function at a time:
$$\begin{aligned}
&\nabla\times(F(x)\times G(x))
\\
&\quad= \bigl(\nabla\cdot G(x)\bigr)F(x) + \bigl(G(x)\cdot\nabla\bigr)F(x)
- \bigl(\nabla\cdot F(x)\bigr)G(x) - \bigl(F(x)\cdot\nabla\bigr)G(x).
\end{aligned}$$
Now consider the identity
$$
|v|^2|w|^2 = (v\cdot w)^2 + |v\times w|^2.
$$
Can we say that
$$
\nabla^2|F(x)|^2 = (\nabla\cdot F(x))^2 + |\nabla\times F(x)|^2?
$$
No: we can linearize the above as
$$
(u\cdot v)|w|^2 = (u\cdot w)(v\cdot w) + (u\times w)\cdot(v\times w)
$$
and take its second derivative, but each $\nabla$ must differentiate each copy of $F(x)$.
That is to say that
$$
\nabla^2|F(x)|^2
= (\dot\nabla\cdot F(\hat{\dot x}))(\hat\nabla\cdot F(\hat{\dot x}))
+ (\dot\nabla\times F(\hat{\dot x}))\cdot(\hat\nabla\times F(\hat{\dot x}))
$$
Both of these terms then need to be expanded with the subexpression rule into a total of 8 terms,
only two of which will be $(\nabla\cdot F(x))^2$ and $|\nabla\times F(x)|^2$.
Integral Representation
Again using the divergence is a starting point, since
$$
\nabla\cdot F(x) = \lim_{R_x\to 0}\frac1{|R_x|}\oint_{\partial R_x}\hat n(y)\cdot F(y)\,\dd S,
$$
following the Motivation section we get
$$\begin{aligned}
\alpha(L_x(\nabla))
&= (\alpha\circ L_x)(\nabla)
= \nabla\cdot(\alpha\circ L_x)^\sharp
\\
&= \lim_{R_x\to 0}\frac1{|R_x|}\oint_{\partial R_x}\hat n(y)\cdot(\alpha\circ L_y)^\sharp\,\dd S
\\
&= \lim_{R_x\to 0}\frac1{|R_x|}\oint_{\partial R_x}\alpha(L_y(\hat n(y)))\,\dd S
\\
&= \alpha\left(
\lim_{R_x\to 0}\frac1{|R_x|}\oint_{\partial R_x}L_y(\hat n(y))\,\dd S
\right)
\end{aligned}$$
for $L_x : \R^m \to V$ and any $\alpha \in V^*$, hence
$$
L_x(\nabla) = \lim_{R_x\to 0}\frac1{|R_x|}\oint_{\partial R_x}L_y(\hat n(y))\,\dd S
$$
where $y$ is the variable of integration.
As stated before in the case of the divergence,
this shows that $L_x(\nabla)$ can be interpreted as an average
over an infinitesimal neighborhood of $x$.
Combining this with the algebraic expression for the derivative
gives a geometric interpretation of the metric contraction of a tensor.
For any $M(v, w)$ multilinear that has an "antiderivative" $L_x(v)$,
i.e. $\diff L(x; v)(w) = M(v, w)$ for some $x$, we may write
$$
\Tr_{v\cdot w} M(v, w) = \lim_{R_x\to0}\frac1{|R_x|}\oint_{\partial R_x}L_y(\hat n(y))\,\dd S.
$$
Such an antiderivative is easy to construct in flat space: define $L_x(v) = M(x, v)$.
Following this train of thought to its conclusion will give
$$
\Tr_{v\cdot w} M(v, w)
= \frac1{|V_m|}\oint_{\partial V_m}M(y, y)\,\dd S
= \frac m{|\partial V_m|}\oint_{\partial V_m}M(y, y)\,\dd S
$$
where $V_m \subset \R^m$ is the unit ball centered at the origin,
$|V_m|$ its volume, and $|\partial V_m|$ its surface area.
This last expression is exactly $m$ times the average of $y \mapsto M(y, y)$ over the unit sphere.
$\vec u$ is a 'vector', not a scalar.
Therefore, we should use the identity ,$\nabla^2 \vec u = \nabla(\nabla \cdot \vec u) - \nabla \times (\nabla \times \vec u)$
Let's investigate $r$ component.
The first term of RHS
$\nabla\cdot\vec u=\frac{1}{r}\frac{\partial(r\cdot u_r)}{\partial r}+\frac{1}{r}\frac{\partial u_\theta}{\partial\theta}+\frac{\partial u_z}{\partial z} $
$\nabla(\nabla\cdot\vec u)_r = \frac{\partial}{\partial r}\left((\frac{\partial u_r}{\partial r}+\frac{u_r}{r})+\frac{1}{r}\frac{\partial u_\theta}{\partial\theta}+\frac{\partial u_z}{\partial z})\right)$
$=\frac{\partial^2 u_r}{\partial r^2} + \frac{1}{r}\frac{\partial u_r}{\partial r}-\frac{u_r}{r^2} + \frac{1}{r} \frac{\partial ^2 u_\theta}{\partial r \partial \theta} - \frac{1}{r^2} \frac{\partial u_\theta}{\partial \theta} + \frac{\partial^2 u_z}{\partial r \partial z}$
The second term of RHS
$\nabla \times \vec u = \left(\frac{1}{r}\frac{\partial u_z}{\partial\theta}-\frac{\partial u_\theta}{\partial z}\right)\hat r+\left(\frac{\partial u_r}{\partial z}-\frac{\partial u_z}{\partial r}\right)\hat\theta+\frac{1}{r}\left(\frac{\partial(r u_\theta)}{\partial r}-\frac{\partial u_r}{\partial\theta}\right)\hat z$
$\nabla \times (\nabla \times \vec u)_r = \frac{1}{r} \frac{\partial}{\partial \theta}\left(\frac{1}{r}\left(\frac{\partial (r u_\theta)}{\partial r}- \frac{\partial u_r}{\partial \theta}\right)\right) - \frac{\partial}{\partial z} \left(\frac{\partial u_r}{\partial z} - \frac{\partial u_z}{\partial r}\right)$
$ =\frac{1}{r}\left(\frac{\partial ^2 u_\theta}{\partial r \partial \theta} \right)+ \frac{1}{r^2}\frac{\partial u_\theta}{\partial \theta} - \frac{1}{r^2} \frac{\partial ^2 u_r}{\partial \theta^2} -\frac{\partial^2 u_r}{\partial z^2} + \frac{\partial^2 u_z}{\partial z \partial r}$
Then, $\nabla^2 \vec u = \nabla(\nabla \cdot \vec u) - \nabla \times (\nabla \times \vec u) = \frac{\partial^2 u_r}{\partial r^2} + \frac{1}{r} \frac{\partial u_r}{\partial r} + \frac{1}{r^2} \frac{\partial^2 u_r}{\partial \theta^2} + \frac{\partial^2 u_r}{\partial z^2} - \frac{u_r}{r^2} - \frac{2}{r^2} \frac{\partial u_\theta}{\partial \theta}$
$ = \nabla^2 u_r - \frac{u_r}{r^2} - \frac{2}{r^2} \frac{\partial u_\theta}{\partial \theta}$
Best Answer
Your conversion of vector field is not correct.
We have $\mathbf{u}= u_1(z) \ \hat x + u_2(z) \ \hat y$ in cartesian coordinates. $u_1$ and $u_2$ are only function of $z$ and not $x, y$. So in cylindrical coordinates, they will again be only function of $z$.
Please note the unit vector conversion from cartesian to cylindrical -
$\hat x = \cos\theta \ \hat r - \sin\theta \ \hat\theta, \hat y = \sin\theta \ \hat r + \cos\theta \ \hat\theta$
So, $\mathbf{u} = (\cos\theta \ \hat r - \sin\theta \ \hat\theta) u_1(z) + (\sin\theta \ \hat r + \cos\theta \ \hat\theta) u_2(z) $
$u_r = u_1(z) \cos\theta+ u_2(z) \sin\theta$
$u_{\theta} = u_2(z) \cos\theta - u_1(z) \sin\theta$
$\displaystyle \small \frac{1}{r} \frac{\partial (r u_r)}{\partial r} + \frac{1}{r} \frac{\partial (u_{\theta})}{\partial \theta} = \frac{1}{r}(u_1(z) \cos\theta + u_2(z) \sin \theta ) + \frac{1}{r} (- u_2(z) \sin\theta - u_1(z) \cos\theta)$
$ = 0$