Here's a way of calculating the divergence.
First, some preliminaries. The first thing I'll do is calculate the partial derivative operators $\partial_x,\partial_y,\partial_z$ in terms of $\partial_r, \partial_\theta, \partial_\varphi$. To do this I'll use the chain rule. Take a function $v:\Bbb R^3\to\Bbb R$ and compose it with the function $g:\Bbb R^3\to\Bbb R^3$ that changes to spherical coordinates: $$g(r,\theta,\varphi) = (r\cos\theta\sin\varphi,r\sin\theta\sin\varphi,r\cos\varphi)$$
The result is $\tilde v(r,\theta,\varphi)=(v\circ g)(r,\theta,\varphi)$ i.e. "$v$ written in spherical coordinates". An abuse of notation is usually/almost-always commited here and we write $v(r,\theta,\varphi)$ to denote what is actually the new function $\tilde v$. I will use that notation myself now. Anyways, the chain rule states that $$\begin{pmatrix}\partial_x v & \partial_y v & \partial_z v\end{pmatrix} \begin{pmatrix} \cos\theta\sin\varphi &-r\sin\theta\sin\varphi & r\cos\theta\cos\varphi \\ \sin\theta\sin\varphi & r\cos\theta\sin\varphi &r\sin\theta\cos\varphi \\\cos\varphi & 0 & -r\sin\varphi\end{pmatrix} = \begin{pmatrix}\partial_r v & \partial_\theta v & \partial_\varphi v\end{pmatrix}$$
From this we get, for example (by inverting the matrix) that $$\partial_x = \cos\theta\sin\varphi\partial_r - \frac{\sin\theta}{r\sin\varphi}\partial_\theta + \frac{\cos\theta\cos\varphi}{r}\partial_\varphi$$
The rest will have similar expressions. Now that we know how to take partial derivatives of a real valued function whose argument is in spherical coords., we need to find out how to rewrite the value of a vector valued function in spherical coordinates. To be precise, the new basis vectors (which vary from point to point now) of $\Bbb R^3$ are found by differentiating the spherical parametrization w.r.t. its arguments (and normalizing). Thus (one example), $$\mathbf e_r = \frac{\partial_r g}{\|\partial_r g\|} = \frac{\begin{pmatrix} \cos\theta\sin\varphi & \sin\theta\sin\varphi & \cos\theta\end{pmatrix}}{\|\begin{pmatrix} \cos\theta\sin\varphi & \sin\theta\sin\varphi & \cos\theta\end{pmatrix}\|} = \begin{pmatrix} \cos\theta\sin\varphi & \sin\theta\sin\varphi & \cos\theta\end{pmatrix} \\[4ex] = \cos\theta\sin\varphi \mathbf i + \sin\theta\sin\varphi \mathbf j + \cos\theta\mathbf k$$
I don't know how to justify this without speaking of tangent spaces, but I'm sure you can ask your teacher for an explanation. After calculating the new unit vectors, you'll again have to invert the relation to obtain $\mathbf i,\mathbf j,\mathbf k$ in terms of $\mathbf e_r,\mathbf e_\theta,\mathbf e_\varphi$. But that part is just linear algebra!
Now that everything is set up, we can calculate the divergence. But what is the divergence? What I mean is, how do we write it as an abstract object that acts on functions? Here is one possibility, in terms as familiar as possible to a calculus student (there are other definitions too): $$\mathrm{div}(\cdot) = \partial_x\left(\langle \mathbf{i},\cdot\rangle\right) + \partial_y\left(\langle \mathbf{j},\cdot\rangle\right) + \partial_z\left(\langle \mathbf{k},\cdot\rangle\right)$$
Where the symbol $\langle\cdot,\cdot\rangle$ is used for the dot product. Try to convince yourself why the above formula is so.
Now just substitue all of the expressions we just derived for the basis vectors, and the differential operators. Finally, place an arbitrary vector field $$ \mathbf E = E_r(r,\theta,\varphi)\,\mathbf e_r + E_\theta(r,\theta,\varphi)\,\mathbf e_\theta + E_\varphi(r,\theta,\varphi)\,\mathbf e_\varphi$$
in place of the "$\cdot$" in the (new) expression for $\mathrm{div}$, and expand.
I've figured out my mistake, thanks to @Event Horizon. My impression from the Help Center page is that I shouldn't delete my question, so I'll outline what went wrong:
I should've applied the divergence theorem to $\nabla \times \mathbf{F}$ instead, so that the statement of the theorem becomes
$$ \iiint_\Omega \nabla \cdot (\nabla \times \mathbf{F}) = \iint_S \nabla \times \mathbf{F} \cdot d\mathbf{S} + \iint_\Sigma \nabla \times \mathbf{F}\cdot d\mathbf{S}. $$
The left is $0$ by since the divergence of a curl is $0$. From $\nabla \times \mathbf{F} = (z^2+x,0,-z-3)$, the correct computation is
$$ \iint_\Sigma \nabla \times \mathbf{F}\cdot d\mathbf{S} = \iint_{x^2+y^2\leq 4 \\ z=2} (-z-3 )dA = \iint-5dA = -20\pi$$ which matches what I got using Stokes' theorem (up to sign - but this just depends on the orientation of $S$).
Best Answer
The region described is not a cylinder. It is a sphere of radius 4 cut by the planes $z=0,3$. Spherical coordinates is most appropriate for this problem.
When you apply the divergence theorem, you establish an equality between a volume integral and a surface integral with a very particular relationship in what they integrate over. Any volume must have some boundary surface(s). The volume integral is over some region, and the corresponding surface integral in the divergence theorem is only over the boundary surfaces of that region.
If the region were a cube, the volume integral would be over the whole cube. The surface integral would be over the 6 faces of the cube.
In this case, the region is (part of) a sphere. The volume integral is over that region, and the surface integral is over two planar surfaces and part of a spherical surface.