The Jacobian map is for transforming vectors expressed in terms of one set of coordinate basis vectors into another coordinate system's basis vectors. Positions like $(x,y)$ and $(r,\theta)$ are not expressed in terms of coordinate basis vectors, so it's inappropriate to use the Jacobian to try to convert between them.
Let $e_1, e_2$ be a pair of basis vectors. We can express positions on the 2d plane as $p = x e_1 + y e_2$.
Now, let $f(p) = p' = r e_1 + \theta e_2$. This looks like a change of coordinates, but it's really not--it's an active deformation of the plane into something where $r, \theta$ are "Cartesian" coordinates. This is just an active transformation, however, and fully equivalent to the passive change of coordinates that you're used to.
$f$ is appropriate to move positions to new positions, but it is not appropriate to move, for example, the tangent vector to a curve from one space to another (that is, to express such a tangent vector in terms of the polar coordinate basis vectors). For this, we need the Jacobian map $J_f$.
Example: let $\ell(t) = e_1 \cos t + e_2 \sin t$ be a curve that draws out the unit circle. It's clear that its derivative is the tangent vector $\dot \ell(t) = -e_1 \sin t + e_2 \cos t$. We can't transform this tangent vector using $f$; we must use $J_f$ instead.
(You'll note here I'm moving from Cartesian coordinates to polar, backwards from what you wanted* but the math is basically the same.)
Here's the Jacobian map:
$$\begin{align*}
J_f(e_1) &= \frac{x e_1}{\sqrt{x^2 + y^2}} - y e_2 \\ J_f (e_2) &= \frac{y e_1}{\sqrt{x^2 + y^2}} + x e_2\end{align*}$$
Along the curve, $x = \cos t$ and $y = \sin t$, so we get
$$\begin{align*} J_f (\dot \ell(t)) &= -(\sin t )(e_1 \cos t - e_2 \sin t) + (\cos t)(e_1 \sin t + e_2 \cos t) \\ &= e_2\end{align*}$$
Remember that $e_2$ is associated with $\theta$--this says that, unsurprisingly, the velocity is entirely in the $\theta$ direction along this curve. We conclude that $\dot \ell(t) = J_f^{-1}(e_2) = e_\theta$.
In conclusion, we started with a tangent vector $\dot \ell(t)$ in our Cartesian coordinate system, and we moved it--using the Jacobian $J_f$--into a deformed plane where $(r,\theta)$ are "Cartesian" coordinates instead. The Jacobian is what moves tangent vectors from one space to another (or between coordinate systems), but positions are different and will always be handled by the full, nonlinear transformation.
One way you can remember this is that the Jacobian is like the derivative of the transformation, and so it's appropriate for moving things involving derivatives, like $\dot \ell(t)$, which is a velocity.
I like to think of it in the coordinate-free notation to remember how it works. If $\gamma(t)$ is your path, going from $t=a$ to $t=b$ (with velocity/tangent vector $\dot{\gamma}(t)$), and if $g$ is the metric then the length of the curve is
$$ \int_a^b \sqrt{g(\dot{\gamma}(t), \dot{\gamma}(t))} \; dt $$
If you want to compute it in different coordinates, just use the pull-back. That is, if $x = \varphi(y)$ changes from the $y$-coordinates to the $x$-coordinates, then the metric expressed in the $y$-coordinates is given by $\varphi^*g$. Then if you want to do this integral in the new coordinates, just make the appropriate substitutions.
In your polar coordinate example, you'd have $\varphi$ given by
$$ x = r\cos(\theta), ~~~ y = r\sin(\theta) $$
The usual Euclidean metric on the plane (in the "$x$"-coordinates) is
$$ g = dx \otimes dx + dy \otimes dy $$
Doing the pull-back $\varphi^*(g)$ gives
$$ \varphi^*g = dr \otimes dr + r^2 d\theta \otimes d\theta $$
So now if you write $\gamma(t)$ in polar coordinates as $\gamma(t) = (r(t), \theta(t))$, with $\dot{\gamma}(t) = (\dot{r}(t), \dot{\theta}(t))$, you get
$$ \int_a^b \sqrt{\dot{r}(t)^2 + r(t)^2 \dot{\theta}(t)^2} \; dt $$
As for your matrix formula: this is correct, and here is how it relates to the above. The metric is a "symmetric bilinear form" on the tangent spaces, and on a vector space, a symmetric bilinear form $\beta(\cdot, \cdot)$ is given by a symmetric $n$-by-$n$ matrix $B$ such that for any vectors $v$ and $w$, you have $\beta(v,w) = \left< v, Bw \right>$, where $\left< \cdot,\cdot \right>$ is the standard Euclidean dot product. In our case, the metric $g$ is represented in coordinates $(x_1,\dots,x_n)$ at every point by a matrix $G = \left(g_{ij}\right)_{i,j=1}^n$. Now suppose at the same point you look at a different system of coordinates $(y_1,\dots,y_n)$, as above, with transition map $\varphi$ going from the $y$-coordinates to the $x$-coordinates, with Jacobian matrix ("push-forward") $\varphi_*$, which is given by the matrix $S$ as in your post. Also, as in your notation, let $M = \left(M_{ij}\right)$ be matrix corresponding to the metric $g$ in the $y$-coordinates. Then your claim is that $M = S^T G S$ is how the metric changes when switching coordinates. This is because of the definition of how "pull-back" works, along with a property of the Euclidean dot product:
On the one hand, in the $y$-coordinates, if we have two vectors $v$ and $w$, then $g(v,w) = \left< v, M w \right>$. On the other hand, using the definition of the pull-back (remember the metric in $y$-coords is $\varphi^* g$, where $g$ is in the $x$-coords):
$$
\begin{eqnarray}
(\varphi^*g)(v,w) &=& g(\varphi_* v, \varphi_* w) \\
&=& \left< \varphi_* v, G \varphi_* w \right> \\
&=& \left< S v, GS w \right> \\
&=& \left< v, S^TGS w \right>
\end{eqnarray}
$$
Best Answer
Maybe a bit of a preamble will be useful here. If $\xi$ are some coordinates defining the local metric $\eta$ then, under the transformation $x^\alpha = x^{\alpha}(\xi)$ the metric becomes
$$ g_{\mu\nu} = \frac{\partial \xi^{\alpha}}{\partial x^\mu}\frac{\partial \xi^{\beta}}{\partial x^\nu} \eta_{\mu\nu} \tag{1} $$
So for example, if you take $\xi^1 = x$ and $\xi^2 = y$ the cartesian coordinates, then the local matrix is the identity $\eta_{\mu\nu} = \delta_{\mu\nu}$ and Eq. (1) becomes
$$ g_{\mu\nu} = J^{\alpha}_{\;\mu}J^{\beta}_{\;\nu}\delta_{\alpha\beta} = (J^T J)_{\mu\nu} \tag{2} $$
where the entries of the matrix $J$ are defined as
$$ J^{\alpha}_{\;\beta} = \frac{\partial \xi^\alpha}{\partial x^\beta} \tag{3} $$
As an example take $x^1 = r$ and $x^2 = \theta$ (your case), so that $\xi^1 = r\cos\theta = x^1 \cos x^2$ and $\xi^2 = x^1\sin x^2$, it is easy to calculate
$$ \frac{\partial \xi^1}{\partial x^1} = \frac{\partial }{\partial x^1}(x^1 \cos x^2) = \cos x^2 ~~~(\cdots) $$
so when you evaluate that in (1) you get
$$ g_{11} = 1, ~ g_{12} = g_{21} = 0 ~\mbox{and}~ g_{22} = (x^1)^2 $$
It is important to keep track of the order of things here. In this representation we are labeling the rows with the super-index $\alpha$ and the columns with the sub-index $\beta$.
Now, the contravariant version of Eq. (2) is
$$ g^{\mu\nu} = \eta^{\alpha \beta}\frac{\partial x^\mu}{\partial \xi^\alpha} \frac{\partial x^\nu}{\partial \xi^\beta} \tag{4} $$
and again, under the same assumptions as before, this transforms to
$$ g^{\mu\nu} = \delta^{\alpha\beta}\mathcal{J}^\mu_{\;\alpha}\mathcal{J}^\nu_{\;\beta} = (\mathcal{J}\mathcal{J}^T)^{\mu\nu} \tag{5} $$
where
$$ \mathcal{J}^{\alpha}_{\;\beta} = \frac{\partial x^\alpha}{\partial \xi^\beta} \tag{6} $$
Taking the same changes of coordinates you will get $x^1 = [(\xi^1)^2 + (\xi^2)^2]^{1/2}$ and $x^2 = \arctan(\xi^2/\xi^1)$ so that
$$ \frac{\partial x^1}{\partial \xi^1} = \frac{\xi^1}{[(\xi^1)^2 + (\xi^2)^2]^{1/2}} ~(\cdots) $$
which results in
$$ g^{11} = 1, ~ g^{12} = g^{21} = 0, ~\mbox{and}~ g^{22} = \frac{1}{(\xi^1)^2 + (\xi^2)^2} = \frac{1}{(x^1)^2} $$
as expected