I like to think of it in the coordinate-free notation to remember how it works. If $\gamma(t)$ is your path, going from $t=a$ to $t=b$ (with velocity/tangent vector $\dot{\gamma}(t)$), and if $g$ is the metric then the length of the curve is
$$ \int_a^b \sqrt{g(\dot{\gamma}(t), \dot{\gamma}(t))} \; dt $$
If you want to compute it in different coordinates, just use the pull-back. That is, if $x = \varphi(y)$ changes from the $y$-coordinates to the $x$-coordinates, then the metric expressed in the $y$-coordinates is given by $\varphi^*g$. Then if you want to do this integral in the new coordinates, just make the appropriate substitutions.
In your polar coordinate example, you'd have $\varphi$ given by
$$ x = r\cos(\theta), ~~~ y = r\sin(\theta) $$
The usual Euclidean metric on the plane (in the "$x$"-coordinates) is
$$ g = dx \otimes dx + dy \otimes dy $$
Doing the pull-back $\varphi^*(g)$ gives
$$ \varphi^*g = dr \otimes dr + r^2 d\theta \otimes d\theta $$
So now if you write $\gamma(t)$ in polar coordinates as $\gamma(t) = (r(t), \theta(t))$, with $\dot{\gamma}(t) = (\dot{r}(t), \dot{\theta}(t))$, you get
$$ \int_a^b \sqrt{\dot{r}(t)^2 + r(t)^2 \dot{\theta}(t)^2} \; dt $$
As for your matrix formula: this is correct, and here is how it relates to the above. The metric is a "symmetric bilinear form" on the tangent spaces, and on a vector space, a symmetric bilinear form $\beta(\cdot, \cdot)$ is given by a symmetric $n$-by-$n$ matrix $B$ such that for any vectors $v$ and $w$, you have $\beta(v,w) = \left< v, Bw \right>$, where $\left< \cdot,\cdot \right>$ is the standard Euclidean dot product. In our case, the metric $g$ is represented in coordinates $(x_1,\dots,x_n)$ at every point by a matrix $G = \left(g_{ij}\right)_{i,j=1}^n$. Now suppose at the same point you look at a different system of coordinates $(y_1,\dots,y_n)$, as above, with transition map $\varphi$ going from the $y$-coordinates to the $x$-coordinates, with Jacobian matrix ("push-forward") $\varphi_*$, which is given by the matrix $S$ as in your post. Also, as in your notation, let $M = \left(M_{ij}\right)$ be matrix corresponding to the metric $g$ in the $y$-coordinates. Then your claim is that $M = S^T G S$ is how the metric changes when switching coordinates. This is because of the definition of how "pull-back" works, along with a property of the Euclidean dot product:
On the one hand, in the $y$-coordinates, if we have two vectors $v$ and $w$, then $g(v,w) = \left< v, M w \right>$. On the other hand, using the definition of the pull-back (remember the metric in $y$-coords is $\varphi^* g$, where $g$ is in the $x$-coords):
$$
\begin{eqnarray}
(\varphi^*g)(v,w) &=& g(\varphi_* v, \varphi_* w) \\
&=& \left< \varphi_* v, G \varphi_* w \right> \\
&=& \left< S v, GS w \right> \\
&=& \left< v, S^TGS w \right>
\end{eqnarray}
$$
You appear dazed and confused, so I will address some of your confusion.
Coordinate bases vs. orthonormal frames:
First of all, I don't know what you mean by $\hat{r}$ and $\hat{\theta}$, but in standard notation, this refers to unit vectors.
Generally speaking, coordinate bases and general frames behave differently.
If you take the plane $\mathbb{R}^2$ with standard cartesian coordinates $(x,y)$, and you make the transforms $$ x(r,\theta)=r\cos\theta \\ y(r,\theta)=r\sin\theta, $$ then the coordinate bases are given by $$ \mathbf{g}_1=\frac{\partial\mathbf{r}}{\partial r}\ \text{and}\ \mathbf{g}_2=\frac{\partial\mathbf{r}}{\partial\theta}. $$
Here $\mathbf{r}=(x,y)=x\hat{e}_1+y\hat{e}_2$, where $\hat{e}_i$ are the cartesian basis vectors.
In differential geometry notation you might see them as $\mathbf{g}_1=\partial/\partial r$ and $\mathbf{g}_2=\partial/\partial\theta$, because on a differentiable manifold, there are no "position vectors" like $\mathbf{r}$, so vectors have to be identified with differential operators.
Now, in vector calculus, it is common to use an orthonormal frame instead of the coordinate basis vector fields. The coordinate basis vectors for the polar coordinates are already orthogonal, and the $r$-vector is also unit length everywhere, but the $\theta$-vector is longer the farther you are from the origin, so we usually norm them. We have $\hat{r}=\partial\mathbf{r}/\partial r$ and $\hat{\theta}=(1/r)\partial\mathbf{r}/\partial\theta$, then these vectors form an orthonormal basis everywhere they are defined.
The components of the metric tensor are defined by the inner products of the basis vector fields (the metric tensor is abstractly the same as the inner product, but in terms of components, this is how you define it), so the metric tensor is $g_{ij}=\delta_{ij}$ in the orthonormal frame, but for polar coordinates, it is given by $$ (g_{ij})=\left(\begin{matrix}1 && 0 \\ 0 && r^2\end{matrix}\right). $$ So if you wanna calculate inner products via this form of the metric, you need to use coordinate bases, not the orthonormal frame.
Position vectors?!
What usually neither physicist literature or vector calculus literature will tell you in general, is that in vector calculus/multivariable calculus/physics, there are usually three kinds of vectors. These are:
Position/separation vectors: Arrows that point between two points. Position vectors point from the origin to another point. Separation vectors between arbitrary points.
"Tangent vectors": These vectors are firmly located in one point. They have finite length (norm), but they don't point between two points at all. If you think about it, it makes sense. Consider an electric field $\mathbf{E}(\mathbf{r})$. This associates to a point (represented by the position vector $\mathbf{r}$) a vector $\mathbf{E}(\mathbf{r})$, representing the strength of the electric field there. It is a vector because the electric field has a direction too. But ask, "to which point does that vector point to, from $\mathbf{r}$?". The question doesn't make sense. The vector simply sits in the point $\mathbf{r}$, and it pertains to that point and that point only.
"Moment" vectors: These vectors are not located anywhere. Stuff like, angular momentum of a rigid body, or total momentum of a system of multiple particles. It is a quantity with direction and magnitude, but the "location" of this quantity is ill-defined.
Now comes the important point. In differential geometry, only tangent vectors make sense, because in differential geometry, you consider spaces more general than vector/affine spaces, and the first and third vector concepts are only meaningful in a vector/affine space.
But even if you stay in euclidean spaces, but use curved coordinates to describe stuff, you have to, well, essentially use differential geometry as well.
Why is it? A coordinate system on, say, $\mathbb{R}^2$ is a map $$ \psi:\mathcal{U}\rightarrow \mathbb{R}^2, $$ where $\mathcal{U}$ is some open set of $\mathbb{R}^2$. For example the polar coordinate map is $$ \psi(x,y)=\left(\sqrt{x^2+y^2},\text{arctan}\left(\frac{y}{x}\right)\right), $$ although one usually gives the inverse of this map.
Now, if $\mathbf{r}_1=(x_1,y_1)$ is a point, and $\mathbf{r}_2=(x_2,y_2)$ is another one, then $\mathbf{r}=\alpha\mathbf{r}_1+\beta\mathbf{r}_2$ is also a point, given by the vector sum. What happens when we plug this into the coordinate change map?
$$ \psi(\mathbf{r})=\left(\sqrt{(\alpha x_1+\beta x_2)^2+(\alpha y_1+\beta y_2)^2},\text{arctan}\frac{\alpha y_1+\beta y_2}{\alpha x_1+\beta x_2}\right). $$
This is certainly not equal to $\alpha\psi(\mathbf{r}_1)+\beta\psi(\mathbf{r}_2)$, which is not surprising, since $\psi$ is not linear. But because the coordinate change map is not linear, vector addition and scalar multiplication gets messed up in the new coordinates.
This is why, if you use curvy coordinates, you cannot use "position vectors", and the distinction between points and coordinates must be kept clear.
In the standard coordinates of $\mathbb{R}^2$, or in any other coordinate system, which is related to the standard coordinates via a linear transformation, the addition and scalar multiples of points as vectors makes sense and is consistent (precisely because linear maps respect the vector space structure), so you can use them. But not in curvy coordinates, sorry.
Best Answer
Writing a $2$-tensor as a matrix does depend on "reference". Formally, it goes like this:
We already have a smooth manifold $M$ with a $2$-tensor field $g$ on $M$. For no particular reasons, $g$ happens to be a metric tensor.
We have an open set $U\subseteq M$ and a parameterisation $\varphi:V\to U$ for some open set $V\subseteq\Bbb R^n$.
There is a unique metric tensor $\varphi^*g$ on $V$ that makes $\varphi$ an isometry, i.e. $\varphi$ is a function that preserves distance.
As a $2$-tensor field in $\Bbb R^n$ space, $\varphi^*g$ can be written as an $n$ by $n$ matrix (more properly, a matrix-valued function on $V$).
In the case you mentioned, $g_1$ and $g_2$ are two examples of $\varphi^*g$ (with different $\varphi$). The matrix (like $g_1$ and $g_2$) simply tells you how to describe the same metric in different coordinates. This is the logic flow of how they are obtained:
We already have $\Bbb R^3$ as a smooth manifold with the usual metric $g$.
We have two parameterisations $\varphi_1,\varphi_2$ of (a part of) $\Bbb R^3$, where $\varphi_1$ is the identity function on $\Bbb R^3$ and $\varphi_2$ is spherical coordinates on $\Bbb R^3$.
$g_1$ is defined as $\varphi_1^*g$, while $g_2$ is defined as $\varphi_2^*g$.
If you want to, you can consider another metric $g_0$ on (a part of) $\Bbb R^3$ such that $$\varphi_2^*g_2=\begin{bmatrix}1&0&0\\0&1&0\\0&0&1\end{bmatrix},$$ but then you would face the problem that you are not describing the same metric $g$, but instead you are describing another metric $g_0$.
Notes:
Physicists quite often refrain from teaching students about smooth manifolds even when there is one being investigated. When you see the word "generalized coordinates", they are almost always talking about coordinates on a smooth manifold.
A parameterisation $\varphi:V\to U$ is also called generalized coordinates among physicists, and called smooth coordinate chart among mathematicians.
$\varphi^*g$ is called the pullback metric on $V$.