This is a process that can feel very arbitrary, but using geometric principles, you should be able to develop an intuition about these problems.
Imagine the coordinate functions $v, u, w$ as scalar fields on the 3d space, assigning their respective coordinates to a given position. For all these coordinates, there are associated gradients: $\nabla v$ for $v$, and so on. These tell us the direction of greatest increase for each coordinate.
What we do then is use these gradient vectors as a basis for our space: a set of vectors $g^v, g^u, g^w$ such that $g^v = \nabla v$ and so on. The contravariant metric tensor just measures the dot products of these vectors, so we can have an idea of how to measure lengths with them.
For instance, take $u = y - a(x)$ as you gave us. Taking the gradient of
$u$, we get
$$g^u = \nabla u = (g^x \partial_x + g^y \partial_y + g^z \partial_z) u(x,y,z) = g^y -a'(x) g^x$$
where $g^x, g^y, g^z$ are a Cartesian basis (thus, they are orthonormal), so it's easy to take the dot product:
$$g^{uu} = g^u \cdot g^u = [g^y - a'(x) g^x] \cdot [g^y - a'(x) g^x] = 1 + [a'(x)]^2$$
as you found. So if we have two vectors expressed using this basis of gradients, we can find the overall dot product using the contravariant metric, rather than having to go back and figure out the relationships between those gradients all over again.
1), 2): yes. On Manifolds you can introduce a metric, which makes the manifold a Riemmannian manifold (assuming the metric is positive definite). A metric, by definition, is a (0,2) tensor field which defines a scalar product on the tangent bundle, i.e. the metric in a point $p$ is a scalar product on the tangent space at $p$. This, in local coordinates, has a representation which is usually denoted $(g_{ij})$ and corresponds to what you called the metric tensor.
If the metric is positiv definite, this matrix representation is invertible and as you wrote, the inverse is usually denoted $(g^{ij})$. The raising and lowering of indices (making contravariant tensors covariant and vice versa) works the same way you wrote it down. This is nothing but the fact that on a Euclidean vector space $E$ there is a natural isomorphism between the vector space and it's dual, induced by the metric (i.e. if $v$ is a vector, $w\mapsto \langle v, w \rangle$ is a linear map on $E$ and each linear map arises that way).
As for 3), most books on Riemannian geometry should do the job. Which one suits you best depends on you. A very comprehensive description of these things is to be found in Spivaks treatise 'A comprehensive introduction to Differential Geometry' ;-)
Best Answer
A metric tensor takes two tangent vectors and returns a number, their inner product. Under a coordinate transformation or a map between manifolds, tangent vectors $u$ are transformed (pushed-forward) by the differential of the map represented by the Jacobian matrix: $u\mapsto Ju$, and the Euclidean inner product $u^Tu\mapsto(Ju)^T(Ju)=u^TGu$, where $G=J^TJ$ is the matrix of the new metric tensor. From a relevant question about volume forms, you may see sometimes that the new volume element gets a factor written either $|\det J|=\sqrt{|\det G|}$. Answering your 2nd question, neither matrix has to be diagonal.