Let's start at the beginning:
The setting for relativity - be it special or general - is that spacetime is a manifold $\mathcal{M}$, i.e. something that is locally homeomorphic to Cartesian space $\mathbb{R}^n$ ($n = 4$ in the case of relativity), but not globally.
Such manifolds possess a tangent space $T_p\mathcal{M}$ at every point, which is where the vectors one usually talks about live. If you choose coordinates $x^i$ on the manifold, then the space of tangent vectors is
$$T_p\mathcal{M} := \{\sum_{i=0}^3 c^i \frac{\partial}{\partial x^i} \lvert c^i \in \mathbb{R} \}$$
When we say that a tupel $(c^0,c^1,c^2,c^3)$ is a vector, we mean that is corresponds to the object $c^i\partial_i \in T_p\mathcal{M}$ at some point $p \in \mathcal{M}$.
A metric on $\mathcal{M}$ can be given by specifying a non-degenerate, bilinear form at each point
$$g_p : T_p\mathcal{M} \times T_p\mathcal{M} \rightarrow \mathbb{R}$$
What you learned "in general" is that the components of the metric are, for chosen basis vectors $\partial_i$ of $T_p\mathcal{M}$, defined by $g_{ij} = g(\partial_i,\partial_j)$. You can now indeed see the metric as a kind of scalar product, setting $X \cdot Y := g(X,Y)$ for two vectors $X,Y$. (This contains the answer to your second problem) But for non-Riemannian manifolds, i.e. manifolds where not all entries in the metric are positive, this is not a scalar product in the sense you may be used to. In particular, it can be zero. Vectors for which it is zero are usually called lightlike or null.
The important thing to take away is that manifolds do not always behave like cartesian space.
Now, for your third problem, we need the concept of the cotangent space $T_p^*\mathcal{M}$. It is the dual vector space to the tangent space, spanned by the differentials $\mathrm{d}x^i : T_p\mathcal{M} \rightarrow \mathbb{R}$ for a chosen coordinate system, and defined by
$$\mathrm{d}x^i(\partial_j) = \delta^i_j$$
Now, recall that the metric was a map from twice the tangent space to $\mathbb{R}$. As such, we can see it as an element of the tensor product $T_p^*\mathcal{M} \otimes T_p^*\mathcal{M}$, which is the space spanned by element of the form $\mathcal{d}x^i \otimes \mathcal{d}x^j$. As the metric is an element of this space, it is expandable in its basis:
$$ g = g_{ij}\mathrm{d}x^i\mathrm{d}x^j$$
where the physicist just drops the bothersome $\otimes$ sign. Now, what has this to do with infinitesimal distance? We simply define the length of a path $\gamma : [a,b] \rightarrow \mathcal{M}$ to be (with $\gamma'(t)$ denoting the tangent vector to the path)$[1]$
$$ L[\gamma] := \int_a^b \sqrt{\lvert g(\gamma'(t),\gamma'(t))\rvert}\mathrm{d}t$$
And, by using physicists' sloppy notation, $g(\gamma'(t),\gamma'(t)) = g_{ij} \frac{\mathrm{d}x^i}{\mathrm{d}t}\frac{\mathrm{d}x^j}{\mathrm{d}t}$, if we understand $x^i(t)$ as the $i$-th coordinate of the point $\gamma(t)$, and so:
$$ L[\gamma] = \int_a^b \sqrt{g_{ij} \frac{\mathrm{d}x^i}{\mathrm{d}t}\frac{\mathrm{d}x^j}{\mathrm{d}t}}\mathrm{d}t = \int_a^b \sqrt{g_{ij}\mathrm{d}x^i\mathrm{d}x^j}\frac{\mathrm{d}t}{\mathrm{d}t} = \int_a^b \sqrt{g_{ij}\mathrm{d}x^i\mathrm{d}x^j}$$
Since we call $\mathrm{d}s$ the infinitesimal line element that fulfills $L = \int \mathrm{d}s$, this is suggestive of the notation
$$ \mathrm{d}s^2 = g_{ij}\mathrm{d}x^i\mathrm{d}x^j$$
If we notice that, by the definition of tangent and cotangent vectors by differentials and deriviatives as above, things with upper indices transform exactly in the opposite way from the things with lower indices (see also my answer here), it is seen that this is indeed invariant under arbitrary coordinate transformations.
$[1]$ $\gamma'(t)$ is really a tangent vector in the following sense:
Let $x : \mathcal{M} \rightarrow \mathbb{R}^n$ be a coordinate chart. Consider then: $ x \circ \gamma : [a,b] \rightarrow \mathbb{R}^n$. Since it is an ordinary function between (subsets of) cartesian spaces, it has a derivative
$$(x \circ \gamma)' : [a,b] \rightarrow \mathbb{R}^n$$
Now, $(x \circ \gamma)'^i(t)$ be be thought of as the components of the tangent vector $\gamma'(t) := (x \circ \gamma)'^i(t)\partial_i \in T_{\gamma(t)}\mathcal{M}$. It is a somewhat tedious, but worthwhile excercise to show that this definition of $\gamma'(t)$ is independent of the choice of coordinates $x$.
You exam question with the surfaces is asking about something different. You are given an embedding of a lower-dimensional submanifold $\mathcal{N}$ into Cartesian space
$$ \sigma: \mathcal{N} \hookrightarrow \mathbb{R}^n $$
and asked to calculate the induced metric on the submanifold from the Cartesian metric
$$\mathrm{d}s^2 = \sum_{i = 1}^n \mathrm{d}(x^i)^2$$
(which is just the identity matrix in component form w.r.t. any orthonormal basis of coordinates in $\mathbb{R}^n$, i.e. the dot product)
Now, how is a metric induced? Let $y : \mathbb{R}^m \rightarrow \mathcal{N}$ be coordinates for the submanifold (you are actually given $\sigma \circ y$ in the question), and $x$ be the coordinates of the Cartesian space. Observe that any morphism of manifolds $\sigma$ induces a morphism of tangent spaces
$$ \mathrm{d}\sigma_p : T_p\mathcal{N} \rightarrow T_{\sigma(p)}\mathbb{R}^n, \frac{\partial}{\partial y^i} \mapsto \sum_j \frac{\partial(\sigma \circ y)^j}{\partial y^i}\frac{\partial}{\partial x^j} $$
called the differential of $\sigma$. As a morphism of vector spaces, it is a linear map given, as a matrix, by the Jacobian $\mathrm{d}\sigma^{ij} := \frac{\partial(\sigma \circ y)^j}{\partial y^i}$ of the morphism of manifolds. Now, inducing a metric means setting
$$ g_\mathcal{N}(\frac{\partial}{\partial y^i},\frac{\partial}{\partial y^j}) := g_\mathrm{Euclidean}(\mathrm{d}\sigma(\frac{\partial}{\partial y^i}),\mathrm{d}\sigma(\frac{\partial}{\partial y^j}))$$
On the right hand side is now the dot product of two ordinary vectors in $\mathbb{R}^n$, and what your exams call $\vec e_{y^i}$ is my $\mathrm{d}\sigma(\frac{\partial}{\partial y^i})$. If you note that you are given $\sigma \circ y$, then all you need to do is to calculate the metric components by calculating $g_\mathcal{N}$ as above for every possible combination of $y^i,y^j$ (in 2D, fortunately, there's only four).
Best Answer
As you may know, the metric tensor is a bilinear 2-form. It accepts two vectors from vector space $V$ and gives back a real number in $\mathbb R$. It is linear in both arguments, hence 'bilinear'. The metric tensor is interpreted as a linear operator in the sense that it maps one of its arguments (either one; doesn't matter because it's symmetric) to a dual vector in $V^*$. This dual vector is interpreted as a functional on $V$ (the traditional definition of the dual space of $V$), which acts on the second vector to give a scalar value. So $g$ is a linear map from $V$ to $V^*$. When you write $g$ as a matrix and operate on a column vector $v$, transpose the resulting vector to make it a row vector and you have the dual vector $v^*$.
From a general point of view, the metric tensor is a rank 2 tensor, specifically a rank $(0,2)$ tensor. In general, a rank $(n,m)$ tensor is a multilinear functional which acts on an ordered collection of vectors in $V$ and dual vectors in the dual space $V^*$. For a vector space $V$ over a field $\mathbb F$ (usually $\mathbb R$ or $\mathbb C$), a tensor $T$ is a multilinear map of the form
$$ T : V^m \times V^{*n} \rightarrow \mathbb F .$$
Rank $(0,2)$ tensors over the real numbers, like $g_{\mu \nu}$,
$$ g : V \times V \rightarrow \mathbb R$$
are particularly interesting as they often appear in mathematics and physics. This is because they define inner products. The inner product between two vectors $\begin{pmatrix}a_1\\a_2\end{pmatrix}$ and $\begin{pmatrix}b_1\\b_2\end{pmatrix}$ in an inner product space $V$ is
$$\begin{pmatrix}a_1\\a_2\end{pmatrix} \cdot \begin{pmatrix}b_1\\b_2\end{pmatrix} = \begin{pmatrix}a_1&a_2\end{pmatrix} \begin{pmatrix}A_{11}&A_{12}\\A_{21}&A_{22}\end{pmatrix} \begin{pmatrix}b_1\\b_2\end{pmatrix}$$
where $\mathbf A$ forms a symmetric positive-definite matrix (symmetric with positive real eigenvalues). By convention we normally write vectors in $V$ in an orthonormal basis, which is a basis that diagonalises $\mathbf A$ to the identity matrix, and so we usually omit $\mathbf A$ entirely when taking inner products because of this orthonormal choice of basis: $$\begin{pmatrix}a_1\\a_2\end{pmatrix} \cdot \begin{pmatrix}b_1\\b_2\end{pmatrix} = \begin{pmatrix}a_1&a_2\end{pmatrix} \begin{pmatrix}b_1\\b_2\end{pmatrix}$$ when the vectors are written in an orthonormal basis.
These inner products $\mathbf A$ are basically the same thing as metric tensors $g$. Two terms for one concept. Of course in pseudoriemannian geometry, $\mathbf A$/$g$ need not be positive-definite. It is clear how $\mathbf A$ should be interpreted as a linear operator though, right? It maps the vector $\mathbf b$ to its dual vector $\mathbf b^*$ like so: $$\mathbf b^* (\mathbf a) = \mathbf a \cdot \mathbf b \tag{definition of dual vector space $V^*$}$$ $$\begin{align}\mathbf a \cdot \mathbf b &= \begin{pmatrix}a_1&a_2\end{pmatrix} \begin{pmatrix}A_{11}&A_{12}\\A_{21}&A_{22}\end{pmatrix} \begin{pmatrix}b_1\\b_2\end{pmatrix} \\ &= \left[ \begin{pmatrix}A_{11}&A_{12}\\A_{21}&A_{22}\end{pmatrix} \begin{pmatrix}b_1\\b_2\end{pmatrix} \right]^{\mathrm T} \begin{pmatrix}a_1\\a_2\end{pmatrix} \\ &\Rightarrow \quad \mathbf b^* = \left[ \begin{pmatrix}A_{11}&A_{12}\\A_{21}&A_{22}\end{pmatrix} \begin{pmatrix}b_1\\b_2\end{pmatrix} \right]^{\mathrm T} \end{align}.$$ Having to take the transpose makes this a little confusing, but it should be clear that $\mathbf A$ defines a dual vector $\mathbf b^*$ for each vector $\mathbf b$.
The concept of a metric tensor is basically the same thing, but with a different notation. I could not say earlier that $\mathbf A$ maps a vector to its dual, but that it defines such a map. This is because I had to use the transpose operation. Matrices are a notation designed to express vectors $V$ and linear operators $M : V \rightarrow V$, and the notation is not flexible enough to express a linear map $V \rightarrow V^*$. The notation used for expressing metric tensors (upper/lower index notation; tensor notation; not sure if it has a better name) is more flexible. The metric is denoted $g$, and by writing it with two lower indices as $g_{\mu \nu}$ we are designating it as a rank $(0,2)$ tensor that maps $V \times V \rightarrow \mathbb R$. By giving $g_{\mu \nu}$ just one argument and leaving the other empty, we are left with a map $V \rightarrow \mathbb R$, which is the same thing as a dual vector in $V^*$. We write vectors by their components, $x^\mu$, and then $g$ defines a linear map $g : V \rightarrow V^*$ like so: $$g : \mathbf x \mapsto \mathbf x^*, \quad x_\mu = \sum_{\nu} g_{\mu \nu} x^{\nu}.$$ The notation $x^\mu$ expresses the components of the vector $\mathbf x$ in the chosen basis of $V$, and $x_\mu$ expresses the components of the dual vector $\mathbf x^*$ in the dual basis of $V$, i.e. the corresponding basis in $V^*$. The notation is frequently heavily abused for brevity, so you may see expressions like $$g : x^\mu \rightarrow g_{\mu \nu} x^\nu \tag{implied summation over $\nu$}$$ to mean the same thing as I said above.
You may notice the similarity with matrix multiplication: $$(b^*)_\mu = \sum_{\nu} A_{\mu \nu} b_\nu.$$ When $g$ is expressed as a matrix as in your question, it simply maps the components $x^\mu$ to the components of its dual vector, $x_\mu$. It very much is a linear map, $g : V \rightarrow V^*$ and all the associated tools of linear analysis may be applied.