1), 2): yes. On Manifolds you can introduce a metric, which makes the manifold a Riemmannian manifold (assuming the metric is positive definite). A metric, by definition, is a (0,2) tensor field which defines a scalar product on the tangent bundle, i.e. the metric in a point $p$ is a scalar product on the tangent space at $p$. This, in local coordinates, has a representation which is usually denoted $(g_{ij})$ and corresponds to what you called the metric tensor.
If the metric is positiv definite, this matrix representation is invertible and as you wrote, the inverse is usually denoted $(g^{ij})$. The raising and lowering of indices (making contravariant tensors covariant and vice versa) works the same way you wrote it down. This is nothing but the fact that on a Euclidean vector space $E$ there is a natural isomorphism between the vector space and it's dual, induced by the metric (i.e. if $v$ is a vector, $w\mapsto \langle v, w \rangle$ is a linear map on $E$ and each linear map arises that way).
As for 3), most books on Riemannian geometry should do the job. Which one suits you best depends on you. A very comprehensive description of these things is to be found in Spivaks treatise 'A comprehensive introduction to Differential Geometry' ;-)
Representing tensors using matrix notation is often confusing, but let's assume that
$Y = \begin{pmatrix}y^1_1&y^1_2\\v^2_1&y^2_2\end{pmatrix}$
and similarly for Z. If $W = Y \times Z$ then the components of $W$ are
$w^{ik}_{jl} = y^i_j z^k_l$
You have represented W as a 4x4 matrix, but it would be more accurate to represent it as a 2x2 matrix, each of whose entries is another 2x2 matrix.
Anyway, the four possible contractions of W are:
(1): $w^{ik}_{il} = y^i_i z^k_l$
(2): $w^{ik}_{jk} = y^i_j z^k_k$
(3): $w^{ik}_{ji} = y^i_j z^k_i$
(4): $w^{ij}_{jl} = y^i_j z^j_l$
In terms of matrix operations:
(1) is the component representation of $Tr(Y)Z$
(2) is the component representation of $Tr(Z)Y$
(3) is the component representation of $YZ$
(4) is the component representation of $ZY$
Best Answer
This is a process that can feel very arbitrary, but using geometric principles, you should be able to develop an intuition about these problems.
Imagine the coordinate functions $v, u, w$ as scalar fields on the 3d space, assigning their respective coordinates to a given position. For all these coordinates, there are associated gradients: $\nabla v$ for $v$, and so on. These tell us the direction of greatest increase for each coordinate.
What we do then is use these gradient vectors as a basis for our space: a set of vectors $g^v, g^u, g^w$ such that $g^v = \nabla v$ and so on. The contravariant metric tensor just measures the dot products of these vectors, so we can have an idea of how to measure lengths with them.
For instance, take $u = y - a(x)$ as you gave us. Taking the gradient of $u$, we get
$$g^u = \nabla u = (g^x \partial_x + g^y \partial_y + g^z \partial_z) u(x,y,z) = g^y -a'(x) g^x$$
where $g^x, g^y, g^z$ are a Cartesian basis (thus, they are orthonormal), so it's easy to take the dot product:
$$g^{uu} = g^u \cdot g^u = [g^y - a'(x) g^x] \cdot [g^y - a'(x) g^x] = 1 + [a'(x)]^2$$
as you found. So if we have two vectors expressed using this basis of gradients, we can find the overall dot product using the contravariant metric, rather than having to go back and figure out the relationships between those gradients all over again.