What you've encountered is that "the direction changes" is not complete intuition about what curl means -- because indeed there are many "curved" vector fields with zero curl.
A better way to think of the curl is to think of a test particle, moving with the flow, and surrounded by a bunch of other test particles arranged in a circle. As the particles move with the flow, the direction between each particle and the center may change -- and the curl measures the speed of that change averaged over the entire circle. If some parts of the circle drift clockwise and other parts drift counterclockwise, the curl may still add up to zero!
In particular: If the flow line curves to the right, then part of our test circle that are just in front of (or just behind) the center will move clockwise with respect to the center. However, if the strength of neighboring flows vary in the right way (namely, stronger on the inside of the bend), we can get the parts of the cicle that are perpendicular to the flow to drift counterclockwise, and still get zero curl out of it.
As for the demonstration you link to, remember that gradient and curl are both linear. So assume we have some scalar field $f$ such that $\nabla\times\nabla f(x_0)$ is nonzero for some $x_0$. We can then find a $g$ such that $\nabla g(x) = \nabla f(x_0)$ for every $x$ (that is simple linear algebra). Then obviously $\nabla g$ at least has zero curl, so by linearity $\nabla\times \nabla(f-g)$ is the same as $\nabla \times \nabla f$, which we assumed to be nonzero at $x_0$. But we also have $\nabla(f-g)(x_0)=0$, so the gradient of $f-g$ does look like the picture on top of page 2 in your link -- at least close to $x_0$ -- and the argument shows that this shape is impossible for a gradient.
(This is not actually true; you may need to subtract a more complicated gradient to get a nice rotation like that out of an arbitrary vector field).
Best Answer
First I would like to note the difference between the following two statements:
If a vector field is the gradient of a scalar function then the curl of that vector field is zero.
If the curl of some vector field is zero then that vector field is a the gradient of some scalar field.
I have seen some trying to prove the first where I think you are asking for the second
I apologize for not giving full details on math here because I'm doing this on my tablet.
Anyway, if the curl of a vector field $F$ is zero $(\nabla\times F=0)$ then the surface integral of the resulting vector field over any arbitrary surface $S$ is also zero.
Stokes theorem (read the Wikipedia article on Kelvin-Stokes theorem) the surface integral of the curl of any vector field is equal to the closed line integral over the boundary curve.
Then since $\nabla\times F=0$ which implies that the surface integral of that vector field is zero then (BY STOKES theorem) the closed line integral of the boundary curve of that (arbitrary) selected surface is also zero.
Since the selection of the surface is arbitrary the. We can say the closed line integral of $F$ over any arbitrary closed curve is zero.
This implies that the line integral of the vector field $F$ is path independent which means the line integral over any curve only depending the initial and final position (not necessarily a closed curve)
To prove this just divide your closed path into two paths from point $P_{1}$ two point $P_{2}$, call those paths $A$ and $B$, the line integral over a closed path $C$ is equal to the summation of the line integral over paths $A$ and $B$ so:
$$ \oint\limits_C F \mathrm{d}\ell = \int\limits_{A_{P_1 \to P_2}} F \mathrm{d}\ell + \int\limits_{B_{P_2 \to P_1}} F \mathrm{d}\ell =0 $$ Then $$ \begin{split} \int\limits_{A_{P_1 \to P_2}}\! F \mathrm{d}\ell &=- \int\limits_{B_{P_2 \to P_1}} F \mathrm{d}\ell\\ &\Updownarrow\\ \int\limits_{A_{P_1 \to P_2}}\! F \mathrm{d}\ell &= \int\limits_{B_{P_1 \to P_2}} F \mathrm{d}\ell \end{split} $$ ( please note that those are alkaline integral and that there should be a for product sign between the two f and dl)
This latter equality implies that it doesn't matter your choice of the path $A$ or $B$ or any path because the result will be the same and it will only depend on the vector field $F$ and the two end points.
And since it only depends on the two points $P_{1}$ and $P_{2}$, then we can DEFINE a scalar field $\Phi(P)$ (note that the points $P_{1}$ and $P_2$ are position vectors) such that $$\Phi(P_{2}) - \Phi(P_{1}) = \int\limits_{P_{1}}^{P_{2}}F \mathrm{d}\ell$$
(Note that the integral doesn't depend on the path and that is the only reason we can write it this way).
Now from the gradient theorem ( look for the Wikipedia article on gradient theorem ) $F=\nabla \Phi$.
Also look for the Wikipedia article on conservative fields.