The problem is the wrong usage of things like $dx$ and $dy$. People once worked with them as "infinitesimals", but the problem is just that, you can get into confusion pretty quickly. The true rigorous $dx$ and $dy$ are differential forms. They are functions that assign to each point of space one object called alternating tensor. For simplicity, one can consider a tensor to be a multilinear function of vectors, i.e. a function that takes various vectors as parameters, returns numbers and is linear in each parameter with the others held fixed.
The alternating character has to do also with the product of such objects, called the wedge product. This product is such that $dx\wedge dy = -dy\wedge dx$ for example. In your case this is sufficent to establish the fact.
Indeed, the first part of computations is correct:
$$dx = \cos \theta dr-r\sin\theta d\theta,$$
$$dy=\sin\theta dr+r\cos\theta d\theta,$$
now we have
$$dx\wedge dy=(\cos\theta dr-r\sin\theta d\theta)\wedge(\sin\theta dr+r\cos\theta d\theta),$$
but this product is distributive, so that we have
$$dx\wedge dy=(\cos\theta dr)\wedge(\sin\theta dr)+(\cos\theta dr)\wedge(r\cos\theta d\theta)+(-r\sin\theta d\theta)\wedge(\sin\theta dr)+(-r\sin\theta d\theta)\wedge(r\cos\theta d\theta),$$
also scalars can be put outside, so that
$$dx\wedge dy = (\cos\theta\sin\theta)dr\wedge dr+(r\cos^2\theta)(dr\wedge d\theta)-(r\sin^2\theta)d\theta\wedge dr-(r^2\sin\theta\cos\theta)d\theta\wedge d\theta$$
Now, any 1-form $\omega$ satisfies $\omega\wedge \omega = 0$, this is because the alternating property grants that $\omega\wedge\omega=-\omega\wedge\omega$ and so this follows. Because of that, $dr\wedge dr = 0$ and $d\theta\wedge d\theta = 0$. Finally we have
$$dx\wedge dy =r\cos^2\theta dr\wedge d\theta - r\sin^2\theta d\theta\wedge dr,$$
And finally using again the alternating property $-d\theta\wedge dr = dr\wedge d\theta$ and so
$$dx\wedge dy = r\cos^2\theta dr\wedge d\theta + r\sin^2\theta dr\wedge d\theta = r dr\wedge d\theta.$$
Of course, it's not possible to explain everything of differential forms in this single answer, just to show a little of how this fits in your problem. To see more on this, look at Spivak's Calculus on Manifolds (this one is a heavy book), or take a look at "Elementary Differential Geometry" by O'neill, this one has a good introduction to differential forms.
The term "Jacobian" traditionally refers to the determinant of the derivative matrix. The derivative matrix can be thought of as a local transformation matrix.
If you want the amount of change ${dx,dy,dz}$ due to a change ${dr,d\theta,dx}$ multiply the derivative matrix by the latter as a column vector. It's just the chain rule.
Think it through, geometrically.
Best Answer
The Jacobian map is for transforming vectors expressed in terms of one set of coordinate basis vectors into another coordinate system's basis vectors. Positions like $(x,y)$ and $(r,\theta)$ are not expressed in terms of coordinate basis vectors, so it's inappropriate to use the Jacobian to try to convert between them.
Let $e_1, e_2$ be a pair of basis vectors. We can express positions on the 2d plane as $p = x e_1 + y e_2$.
Now, let $f(p) = p' = r e_1 + \theta e_2$. This looks like a change of coordinates, but it's really not--it's an active deformation of the plane into something where $r, \theta$ are "Cartesian" coordinates. This is just an active transformation, however, and fully equivalent to the passive change of coordinates that you're used to.
$f$ is appropriate to move positions to new positions, but it is not appropriate to move, for example, the tangent vector to a curve from one space to another (that is, to express such a tangent vector in terms of the polar coordinate basis vectors). For this, we need the Jacobian map $J_f$.
Example: let $\ell(t) = e_1 \cos t + e_2 \sin t$ be a curve that draws out the unit circle. It's clear that its derivative is the tangent vector $\dot \ell(t) = -e_1 \sin t + e_2 \cos t$. We can't transform this tangent vector using $f$; we must use $J_f$ instead.
(You'll note here I'm moving from Cartesian coordinates to polar, backwards from what you wanted* but the math is basically the same.)
Here's the Jacobian map:
$$\begin{align*} J_f(e_1) &= \frac{x e_1}{\sqrt{x^2 + y^2}} - y e_2 \\ J_f (e_2) &= \frac{y e_1}{\sqrt{x^2 + y^2}} + x e_2\end{align*}$$
Along the curve, $x = \cos t$ and $y = \sin t$, so we get
$$\begin{align*} J_f (\dot \ell(t)) &= -(\sin t )(e_1 \cos t - e_2 \sin t) + (\cos t)(e_1 \sin t + e_2 \cos t) \\ &= e_2\end{align*}$$
Remember that $e_2$ is associated with $\theta$--this says that, unsurprisingly, the velocity is entirely in the $\theta$ direction along this curve. We conclude that $\dot \ell(t) = J_f^{-1}(e_2) = e_\theta$.
In conclusion, we started with a tangent vector $\dot \ell(t)$ in our Cartesian coordinate system, and we moved it--using the Jacobian $J_f$--into a deformed plane where $(r,\theta)$ are "Cartesian" coordinates instead. The Jacobian is what moves tangent vectors from one space to another (or between coordinate systems), but positions are different and will always be handled by the full, nonlinear transformation.
One way you can remember this is that the Jacobian is like the derivative of the transformation, and so it's appropriate for moving things involving derivatives, like $\dot \ell(t)$, which is a velocity.