I'm going to give a rough outline that you can use to remember this, rather than a derivation. In an orthogonal coordinate system, the scale factors determine all the formulae for differential operators: these come from the definition
$$ \frac{\partial \mathbf{r}}{\partial q_i} = h_i \mathbf{e}_i, $$
where $\mathbf{e}_i$ is a unit vector, and $h_i$ is a function taken as positive. We set it up so that the $\mathbf{e}_i$ at a point are orthogonal (this is actually what it means to say the coordinate system is orthogonal),
$$\mathbf{e}_i \cdot \mathbf{e}_i = \delta_{ij}.$$
Then the line element looks like
$$ d\mathbf{r} = \sum_i \frac{\partial \mathbf{r}}{\partial q_i} \, dq_i = \sum_i \mathbf{e}_i h_i \, dq_i. $$
Then we can define the gradient by using the definition $df = \operatorname{grad}{f} \cdot d\mathbf{r}$ (think in Cartesians: this makes sense by the chain rule formula):
$$ df = \sum_i \frac{\partial f}{\partial q_i} \, dq_i = \sum_i \frac{1}{h_i}\frac{\partial f}{\partial q_i} h_i \, dq_i = \sum_{i,j} \frac{1}{h_i}\frac{\partial f}{\partial q_i} \mathbf{e}_i \cdot \mathbf{e}_j h_j \, dq_j = \left( \sum_i \frac{\mathbf{e}_i}{h_i} \frac{\partial f}{\partial q_i} \right) \cdot d\mathbf{r}, $$
using the orthogonality of the $\mathbf{e}_i$, so
$$ \operatorname{grad}f = \sum_i \frac{\mathbf{e}_i}{h_i} \frac{\partial f}{\partial q_i}. $$
Right, now the divergence. Let's think about what we want. My way of remembering what the divergence does is to think of it as a sort of formal adjoint to the gradient operator, because
$$ \operatorname{div}(f\mathbf{u}) = \operatorname{grad}{f} \cdot \mathbf{u} + f \operatorname{div}{\mathbf{u}}, $$
so the divergence theorem gives
$$ \int_V \operatorname{grad}{f} \cdot \mathbf{u} \, dV = - \int_V f \operatorname{div}{\mathbf{u}} \, dV + \text{(boundary terms)}. $$
The volume element is a cuboid spanned by three line elements, so $dV = h_1 h_2 h_3 \, dq_1 \, dq_2 \, dq_3 $.
Let's just look at one term: the others will work in the same way by the symmetry in the operators and linearity: suppose $u = u_1 \mathbf{e}_i$, and then
$$ \int_V \operatorname{grad}{f} \cdot \mathbf{u} \, dV = \iiint \frac{1}{h_1}\frac{\partial f}{\partial q_i} \mathbf{e}_i \cdot \mathbf{e}_1 u_1 \, h_1 h_2 h_3 \, dq_1 \, dq_2 \, dq_3 \\
= \iiint \frac{\partial f}{\partial q_1} u_1 \, h_2 h_3 \, dq_1 \, dq_2 \, dq_3
$$
Now, apply integration by parts to the $q_1$ integral:
$$ \iiint \frac{\partial f}{\partial q_1} u_1 \, h_2 h_3 \, dq_1 \, dq_2 \, dq_3 = \text{(boundary terms)} - \iiint f \frac{\partial}{\partial q_1} (h_2 h_3 u_1) \, dq_1 \, dq_2 \, dq_3 $$
Okay, but now I want to get back to a $dV$, so multiply and divide by $h_1 h_2 h_3$ to obtain
$$ \iiint \frac{\partial f}{\partial q_1} u_1 \, h_2 h_3 \, dq_1 \, dq_2 \, dq_3 = \text{(boundary terms)} - \iiint f \left( \frac{1}{h_1 h_2 h_3} \frac{\partial}{\partial q_1} (h_2 h_3 u_1) \right) \, dV \\
= \text{(boundary terms)} - \iiint f \operatorname{div} \mathbf{u} \, dV, $$
so we conclude that the divergence is given by the expression
$$ \operatorname{div} \mathbf{u} = \frac{1}{h_1 h_2 h_3} \left( \frac{\partial}{\partial q_1} (h_2 h_3 u_1) + \frac{\partial}{\partial q_2} (h_3 h_1 u_2) + \frac{\partial}{\partial q_3} (h_1 h_2 u_3) \right) $$
Right, fine. So what is the divergence in cylindrical coordinates, like you actually asked? Well, we just have to find the scale factors. $\mathbf{r} = (r\cos{\theta},r\sin{\theta},z)$, so
$$ h_r \mathbf{e}_r = 1(\cos{\theta},\sin{\theta},0)\\
h_{\theta} \mathbf{e}_{\theta} = r(-\sin{\theta},\cos{\theta},0)\\
h_z \mathbf{e}_z = 1(0,0,1), $$
so the only nontrivial $h$ is $h_{\theta}=r$. Thus
$$ \operatorname{div} \mathbf{u} = \frac{1}{r} \left( \frac{\partial}{\partial r} (r u_r) + \frac{\partial}{\partial \theta} (u_\theta) + \frac{\partial}{\partial z} (r u_z) \right) = \frac{1}{r} \frac{\partial}{\partial r} (r u_r) + \frac{1}{r}\frac{\partial}{\partial \theta} (u_\theta) + \frac{\partial}{\partial z} (u_z). $$
Best Answer
The area vector of a flat surface has apparently in your text been defined to be perpendicular to the surface. If we want to measure flow through a surface, we want to know the amount of velocity vector is parallel to the area vector (i.e., perpendicular to the surface).
The diagram is perhaps a little confusing. The area $A$ is a parallelogram drawn on the floor. Its area vector points up, perpendicular to the floor. The velocity vector points somewhere in the halfspace above the floor. The empty parallelogram is parallel transported along $v$ in one unit of time (whatever time units you are using). The volume of flow extruded through the parallelogram on the floor is the area of that parallelogram times the height (measured perpendicularly to the floor) of the parallelepiped standing on it (bounded by the two parallelograms and the four dotted lines).
If $v$ is parallel to the area vector, then the height is as large as it can be and also, the flow through the parallelogram is maximized. If $v$ is parallel to the floor, there is no flow through the parallelogram (all flow is along the parallelogram), so the volume of the parallelepiped is zero. The dot product is maximized when parallel and zero when perpendicular, which reproduces the desired behaviour. The cross product is zero when parallel and maximized when perpendicular, which is exactly the opposite of the desired behaviour.