1. Void vs Vacuum
The first thing that needs to be done is to distinguish between void and space (ie vacuum).
Space is not nothing, because you can move things in it; think of it as the medium in which particles can move.
For if space was exactly nothing; then where could you put a particle? There is no place you can put it.
2. Crumpling space
The second thing is to imagine how you can warp this space; this is difficult using the space we actually live in.
So let's imagine a page torn out of Ryders QFT is our space. This is easy enough to warp - you can roll it into a cylinder, or crumple it in some other way.
3. Geodesics or straight lines
But how to do physics on this surface? Well, let's just take Newton's first law: a particle without any forces acting on it moves in a straight line.
When the page is rolled out flat on the surface of a table, the path this particle takes is easy enough to imagine - it's just the straight line we can draw by eye.
But how about when it's crumpled? Well, to make things easy for us let's imagine that the page has been crumpled and glued into a sphere. So how can we draw a straight line on this sphere? We can't do the obvious thing and just 'drill' through the sphere in the obvious straight line - because the interior of the sphere is not space but void - so a particle can't go there.
And we can't do the second most obvious thing either, which is just draw the straight line by eye on the surface of this sphere, because between any two points it's not clear what is the straight line that connects them - because to our eye they all look curved.
We turn this around, by asking if there is some unique property that characterises the straight line on the surface; well: on the surface of the paper spread out flat on the surface of a table, our original set-up, we see straight-away that the straight line between two points is the shortest path between two points.
4. Newton's first law again
So, we use this property on the sphere; and now it's easy to see how a particle will follow Newton's first law on this 'curved' or 'warped' space; it's still moves in a straight line, but here we call them geodesics.
In fact, this sphere is 'warped' in a way that the crumpled page is not; because if you draw triangles on it - which we can now, since we know what straight lines look like - we discover that their interior angles always add up to more than 180 degrees.
5. And GR, briefly
And this is how GR works: given a mass distribution in a space, this tells what the geodesics are in this space - ie how it's curved; and then particles move along these geodesics.
As you may know, the metric tensor is a bilinear 2-form. It accepts two vectors from vector space $V$ and gives back a real number in $\mathbb R$. It is linear in both arguments, hence 'bilinear'. The metric tensor is interpreted as a linear operator in the sense that it maps one of its arguments (either one; doesn't matter because it's symmetric) to a dual vector in $V^*$. This dual vector is interpreted as a functional on $V$ (the traditional definition of the dual space of $V$), which acts on the second vector to give a scalar value. So $g$ is a linear map from $V$ to $V^*$. When you write $g$ as a matrix and operate on a column vector $v$, transpose the resulting vector to make it a row vector and you have the dual vector $v^*$.
From a general point of view, the metric tensor is a rank 2 tensor, specifically a rank $(0,2)$ tensor. In general, a rank $(n,m)$ tensor is a multilinear functional which acts on an ordered collection of vectors in $V$ and dual vectors in the dual space $V^*$. For a vector space $V$ over a field $\mathbb F$ (usually $\mathbb R$ or $\mathbb C$), a tensor $T$ is a multilinear map of the form
$$ T : V^m \times V^{*n} \rightarrow \mathbb F .$$
Rank $(0,2)$ tensors over the real numbers, like $g_{\mu \nu}$,
$$ g : V \times V \rightarrow \mathbb R$$
are particularly interesting as they often appear in mathematics and physics. This is because they define inner products. The inner product between two vectors $\begin{pmatrix}a_1\\a_2\end{pmatrix}$ and $\begin{pmatrix}b_1\\b_2\end{pmatrix}$ in an inner product space $V$ is
$$\begin{pmatrix}a_1\\a_2\end{pmatrix} \cdot \begin{pmatrix}b_1\\b_2\end{pmatrix} = \begin{pmatrix}a_1&a_2\end{pmatrix} \begin{pmatrix}A_{11}&A_{12}\\A_{21}&A_{22}\end{pmatrix} \begin{pmatrix}b_1\\b_2\end{pmatrix}$$
where $\mathbf A$ forms a symmetric positive-definite matrix (symmetric with positive real eigenvalues). By convention we normally write vectors in $V$ in an orthonormal basis, which is a basis that diagonalises $\mathbf A$ to the identity matrix, and so we usually omit $\mathbf A$ entirely when taking inner products because of this orthonormal choice of basis:
$$\begin{pmatrix}a_1\\a_2\end{pmatrix} \cdot \begin{pmatrix}b_1\\b_2\end{pmatrix} = \begin{pmatrix}a_1&a_2\end{pmatrix} \begin{pmatrix}b_1\\b_2\end{pmatrix}$$
when the vectors are written in an orthonormal basis.
These inner products $\mathbf A$ are basically the same thing as metric tensors $g$. Two terms for one concept. Of course in pseudoriemannian geometry, $\mathbf A$/$g$ need not be positive-definite. It is clear how $\mathbf A$ should be interpreted as a linear operator though, right? It maps the vector $\mathbf b$ to its dual vector $\mathbf b^*$ like so:
$$\mathbf b^* (\mathbf a) = \mathbf a \cdot \mathbf b \tag{definition of dual vector space $V^*$}$$
$$\begin{align}\mathbf a \cdot \mathbf b &= \begin{pmatrix}a_1&a_2\end{pmatrix} \begin{pmatrix}A_{11}&A_{12}\\A_{21}&A_{22}\end{pmatrix} \begin{pmatrix}b_1\\b_2\end{pmatrix} \\ &= \left[ \begin{pmatrix}A_{11}&A_{12}\\A_{21}&A_{22}\end{pmatrix} \begin{pmatrix}b_1\\b_2\end{pmatrix} \right]^{\mathrm T} \begin{pmatrix}a_1\\a_2\end{pmatrix} \\ &\Rightarrow \quad \mathbf b^* = \left[ \begin{pmatrix}A_{11}&A_{12}\\A_{21}&A_{22}\end{pmatrix} \begin{pmatrix}b_1\\b_2\end{pmatrix} \right]^{\mathrm T} \end{align}.$$
Having to take the transpose makes this a little confusing, but it should be clear that $\mathbf A$ defines a dual vector $\mathbf b^*$ for each vector $\mathbf b$.
The concept of a metric tensor is basically the same thing, but with a different notation. I could not say earlier that $\mathbf A$ maps a vector to its dual, but that it defines such a map. This is because I had to use the transpose operation. Matrices are a notation designed to express vectors $V$ and linear operators $M : V \rightarrow V$, and the notation is not flexible enough to express a linear map $V \rightarrow V^*$. The notation used for expressing metric tensors (upper/lower index notation; tensor notation; not sure if it has a better name) is more flexible. The metric is denoted $g$, and by writing it with two lower indices as $g_{\mu \nu}$ we are designating it as a rank $(0,2)$ tensor that maps $V \times V \rightarrow \mathbb R$. By giving $g_{\mu \nu}$ just one argument and leaving the other empty, we are left with a map $V \rightarrow \mathbb R$, which is the same thing as a dual vector in $V^*$. We write vectors by their components, $x^\mu$, and then $g$ defines a linear map $g : V \rightarrow V^*$ like so:
$$g : \mathbf x \mapsto \mathbf x^*, \quad x_\mu = \sum_{\nu} g_{\mu \nu} x^{\nu}.$$
The notation $x^\mu$ expresses the components of the vector $\mathbf x$ in the chosen basis of $V$, and $x_\mu$ expresses the components of the dual vector $\mathbf x^*$ in the dual basis of $V$, i.e. the corresponding basis in $V^*$. The notation is frequently heavily abused for brevity, so you may see expressions like
$$g : x^\mu \rightarrow g_{\mu \nu} x^\nu \tag{implied summation over $\nu$}$$
to mean the same thing as I said above.
You may notice the similarity with matrix multiplication:
$$(b^*)_\mu = \sum_{\nu} A_{\mu \nu} b_\nu.$$
When $g$ is expressed as a matrix as in your question, it simply maps the components $x^\mu$ to the components of its dual vector, $x_\mu$. It very much is a linear map, $g : V \rightarrow V^*$ and all the associated tools of linear analysis may be applied.
Best Answer
Let's go step by step as it seems you're missing some fundamentals.
We know from (linear) algebra, that a symmetric bilinear form can be transformed to a diagonal matrix with elements $e$ on the main diagonal $e\in \{0,1,-1\}$. The tripel counting the amount of times each number appears is called signature. If you didn't know that, check this.
Now, a metric tensor is a symmetric bilinear form, so we know it has a transform, so that we get its signature. By the way, from Sylvester's law of inertia follows, that the transform is an orthogonal transform, if the matrix is invertible.
I hope this answers the first question. I didn't completely get what your second question was... Diagonalisation is always the same thing.