This is a terrific question. I will try to answer all of your concerns. In general, you should be aware of a few things. First, when we say things like $g_{ij}$, $i$ and $j$ are just placeholders for integers. So it doesn't matter what letters are in the subscript, they mean the same thing. Second, not all tensors can be represented as matrices. Only tensors with two indices have a matrix representation.
The metric tensor does define a geodesic distance, but this is not its only purpose, or even its primary purpose. The main idea of the metric tensor is to show how the coordinates on a manifold relate to each other at each point in the manifold. In some manifolds, like Euclidean space or a cylinder, each point is essentially the same, so the metric tensor is constant, but in most manifolds, the metric tensor is a function (called a tensor field) over the manifold, which depends on the coordinates of the manifold.
The example you gave of the metric tensor in Euclidean space doesn't seem quite right. Specifically, I notice that the indices $k$ and $l$ appear only once in the equation (each index should appear twice in a tensor equation). This equation does look like the formula for converting the metric tensor from one coordinate system to another, which is $$g_{ij}=\frac{\partial x^k}{\partial x^i}\frac{\partial x^l}{\partial x^j}g_{kl}.$$ Here, we have a coordinate system for a 2-manifold characterized by coordinates $x_k$ and $x_l$ with their corresponding metric tensor $g_{kl}$. This formula shows how to convert to the metric tensor for the coordinate system characterized by coordinates $x_i$ and $x_j$. This is like changing from Euclidean to spherical or cylindrical coordinates in $\mathbb R^3$.
Your real question in your second paragraph is how to represent a metric tensor as a matrix. This is quite easy. If your manifold has $n$ dimensions (and thus $n$ coordinates), the metric tensor can be represented by an $n\times n$ matrix where the element of the matrix in the $i^{\text{th}}$ row and $j^{\text{th}}$ column is $g_{ij}$. In $\mathbb R^2$, the matrix representation is $$g_{ij}\doteq\left(\begin{array}{cc} 1 & 0\\ 0 & 1\\ \end{array}\right).$$ You may wonder where this came from. The simplest way to explicitly determine the elements of the metric tensor, and a favorite method in general relativity, is to think about the line element. The line element refers to the infinitesimal distance along a path with respect to the coordinate system. In Euclidean space this is easy because we have the Pythagorean Theorem: $$ds^2=dx^2+dy^2.$$ The line element (called $ds^2$; think of the squared as part of the symbol) is the amount changed in $x$ squared plus the amount changed in $y$ squared. In general, a line element for a 2-manifold would look like this: $$ds^2=g_{11}dx^2+g_{12}dx\,dy+g_{22}dy^2.$$ (notice that the metric tensor is always symmetric, so $g_{12}=g_{21}$.) The terms that involve change along more than one coordinate are called off-diagonal terms because they correspond to off-diagonal elements in the matrix representation. Notice that in Euclidean space, there are no off diagonal terms, so the corresponding matrix is diagonal. Since in Euclidean space, $dx^2$ and $dy^2$ both have coefficients of $1$ in the line element, there are ones in the diagonal.
First of all, if a tensor has more that two indices, than it cannot be represented by a matrix. Of course, we could represent it as a "higher dimensional box of numbers," but then writing things down on a two dimensional piece of paper gets tricky, which is why we have things like Einstein notation. Nevertheless, we shall press on. To do so, we must think of matrices a linear operators. A matrix equation like $Ax=b$ must be read as "$A$ acts on $x$ to give $b$. In this way, the matrix acts on one vector and returns another vector. But where do these vectors come from? They come from the tangent space at a point in the manifold. Whenever we have a tangent space, there is a cotangent space (or dual space) to go with it. A tensor like $A^{ij}$ has a matrix representation which acts on a covector to give another covector. A mixed variance tensor like $R^i_j$ acts on a covector to give a vector or acts on a vector to give a covector. As for the numbers of rows and columns, they should always be $n$ (and $n=4$ in general relativity).
The inverse metric is represented quite literally as the inverse matrix of the metric representation. As for the canceling, this is the same as a reduction in the number of indices when row vector is multiplied by a matrix. The elements that look like $a_{ijkl}$ are multiplied by elements from the inverse metric and added together to get something called $a_{ik}$. Again, $R_{ijkl}$ is not represented by a matrix.
The Christoffel symbol is not a tensor (notice it is not called the Christoffel tensor), but it could still be represented by a "3D box of numbers." The matrices $g_{jl}$, $g_{il}$, and $g_{ij}$ are all the same, but when we assign specific values to $i$, $j$, and $l$, these terms reference different elements of the matrix. Each element $g_{jl}$ is a function of $x^i$ for each coordinate $x^i$. The derivative of $g_{jl}$ with respect to $x_i$ is the standard partial derivative of the $jl$ element of the metric tensor with respect to the $i^\text{th}$ coordinate.
Don't get too hung up on numbers and matrices. Relativists realized a long time ago that this way of thinking is not helpful. Things like Einstein notation are helpful for simplifying calculations. You ask about how elements are affected when indices are changed or cancelled. Just write down the tensor equation in Einstein notation with the appropriate sums and see what happens. It may concern you now, but soon you will not worry so much. Einstein notation is very reliable and won't lead you astray.
I hesitate to give examples in terms of numbers because such calculations are usually very long. In my first relativity class, we were given a metric tensor, and asked to calculate the Ricci tensor, so we first had to calculate Christoffels, then Riemann, then Ricci. I think I used six sheets of paper front and back. But I just did the calculations the way the Einstein notation directs, and it turned out alright. My professor said that he gave us that assignment so we would appreciate the hard work of early relativists. After that, he allowed us to use a computer algebra system.
Edit:
One more thing. If you tried to do the calculations using matrices and "higher dimensional boxes," you would essentially be doing Einstein notation anyway.
A $(p, q)$-tensor on a real vector space $V$ is a multilinear map $T : (V^*)^p\times V^q \to \mathbb{R}$.
Let $\{e_1, \dots, e_n\}$ be a basis for $V$ and $\{e^1,\dots, e^n\}$ the dual basis of $V^*$, then the tensor $T$ is determined by the collection of real numbers $T^{i_1, \dots, i_p}_{j_1, \dots, j_q} := T(e^{i_1},\dots, e^{i_p}, e_{j_1}, \dots, e_{j_q})$. If $\{\hat{e}_1, \dots, \hat{e}_n\}$ is another basis for $V$ and $\{\hat{e}^1, \dots, \hat{e}^n\}$ is the corresponding dual basis, then we get another collection of real numbers $\hat{T}^{i_1',\dots, i_p'}_{j_1', \dots, j_q'} := T(\hat{e}^{i_1'}, \dots, \hat{e}^{i_p'}, \hat{e}_{j_1'},\dots, \hat{e}_{j_q'})$.
If $A$ denotes the change of basis matrix from $\{e_1, \dots, e_n\}$ to $\{\hat{e}_1, \dots, \hat{e}_n\}$ then, using the Einstein summation convention, we have $\hat{e}_i = A^k_ie_k$. The change of basis matrix from $\{e^1, \dots, e^n\}$ to $\{\hat{e}^1, \dots, \hat{e}^n\}$ is $A^{-1}$ so $\hat{e}^j = (A^{-1})^j_k e^k$. It follows that
$$\hat{T}^{i_1',\dots,i_p'}_{j_1',\dots,j_q'} = T^{i_1,\dots,i_p}_{j_1,\dots,j_q}(A^{-1})^{i_1'}_{i_1}\dots(A^{-1})^{i_p'}_{i_p}A^{j_1}_{j_1'}\dots A^{j_q}_{j_q'}.$$
In physics, a $(p, q)$-tensor is often considered as a collection of real numbers $T^{i_1,\dots, i_p}_{j_1,\dots, j_q}$ which transforms under change of basis in the way stated above. As the indices $j_1, \dots, j_q$ change according to the change of basis matrix, we say that they are covariant, while the indices $i_1, \dots, i_p$ change according to the inverse of the change of basis matrix, so we say that they are contravariant. Hence a $(p, q)$-tensor has $p$ contravariant indices and $q$ covariant indices.
Examples:
- A $(0, 1)$-tensor is nothing but a linear map $V \to \mathbb{R}$.
- Given a vector $v \in V$, one obtains a $(1, 0)$-tensor $T_v$ defined by $T_v(\alpha) = \alpha(v)$.
- An inner product on $V$ is an example of a $(0, 2)$-tensor.
- A linear map $L : V \to V$ can be viewed as a $(1, 1)$-tensor $T_L$ defined by $T_L(\alpha, v) = \alpha(L(v))$.
A (not necessarily positive-definite) inner product $g$ defines an isomorphism $\Phi_g : V \to V^*$ given by $\Phi_g(v) = g(v, \cdot)$. This isomorphism can be used to transform a $(p, q)$-tensor $T$ into a $(p - 1, q + 1)$-tensor $T'$ by defining $T'(\alpha^1, \dots, \alpha^{p-1}, v_1, \dots, v_{q+1}) := T(\alpha^1, \dots, \alpha^{p-1}, \Phi_g(v_1), v_2, \dots, v_{q+1})$. Likewise, the inverse isomorphism $\Phi_g^{-1}$ can be used to transform a $(p, q)$-tensor into a $(p + 1, q - 1)$-tensor. Doing this repeatedly, we can view a $(p, q)$-tensor as an $(r, s)$-tensor for any $r$ and $s$ with $r, s \geq 0$ and $r + s = p + q$. Note however that the $(r, s)$-tensor we produce depends on the inner product $g$; for a different inner product, the corresponding $(r, s)$-tensor will not be the same.
A $(p, q)$-tensor field on a smooth manifold $M$ is $C^{\infty}(M)$ multilinear map $T : \Gamma(T^*M)^p\times\Gamma(TM)^q \to C^{\infty}(M)$. That is, a $(p, q)$-tensor on $T_xM$ for every $x \in M$ which varies smoothly as $x$ varies.
Given local coordinates $(x^1, \dots, x^n)$ on $U \subseteq M$, there is a basis of sections for $TM|_U$ given by $\{\partial_1, \dots, \partial_n\}$ where $\partial_i = \frac{\partial}{\partial x^i}$, and a dual basis of sections for $T^*M|_U$ given by $\{dx^1, \dots, dx^n\}$. We then obtain a collection of smooth functions $T^{i_1,\dots,i_p}_{j_1,\dots,j_q} := T(dx^{i_1},\dots, dx^{i_p}, \partial_{j_1}, \dots, \partial_{j_q})$ on $U$. If $\{\hat{x}^1, \dots, \hat{x}^n\}$ is another set of local coordinates on $U$, then $\{\hat{\partial}_1, \dots, \hat{\partial}_n\}$ is a basis of sections for $TM|_U$ where $\hat{\partial}_i = \frac{\partial}{\partial\hat{x}^i}$, and $\{d\hat{x}^1,\dots, d\hat{x}^n\}$ is the dual basis of sections for $T^*M|_U$, so we get another collection of smooth functions $\hat{T}^{i_1',\dots,i_p'}_{j_1',\dots,j_q'} := T(d\hat{x}^{i_1'},\dots, d\hat{x}^{i_p'}, \hat{\partial}_{j_1'},\dots, \hat{\partial}_{j_q'})$ on $U$.
Note that $\hat{\partial}_i = \dfrac{\partial x^k}{\partial \hat{x}^i}\partial_k$ and $d\hat{x}^j = \dfrac{\partial \hat{x}^j}{\partial x^k}dx^k$ so
$$\hat{T}^{i_1',\dots,i_p'}_{j_1',\dots,j_q'} = T^{i_1,\dots,i_p}_{j_1,\dots,j_q}\dfrac{\partial \hat{x}^{i_1'}}{\partial x^{i_1}}\dots \dfrac{\partial \hat{x}^{i_p'}}{\partial x^{i_p}}\dfrac{\partial x^{j_1}}{\partial \hat{x}^{j_1'}}\dots \dfrac{\partial x^{j_q}}{\partial \hat{x}^{j_q'}}$$
Recall that $\left(\dfrac{\partial\hat{x}}{\partial x}\right)^{-1} = \dfrac{\partial x}{\partial\hat{x}}$, so the above is completely analogous to the previous formula for tensors.
Examples:
- A $(0, 1)$-tensor field is nothing but a one-form.
- Given a vector field $V \in \Gamma(TM)$, one obtains a $(1, 0)$-tensor field $T_V$ defined by $T_V(\alpha) = \alpha(V)$.
- A Riemannian or Lorentzian metric on $M$ is an example of a $(0, 2)$-tensor field.
- A bundle map $L : TM \to TM$ can be viewed as a $(1, 1)$-tensor $T_L$ defined by $T_L(\alpha, V) = \alpha(L(V))$.
As in the tensor case, given a Riemannian or Lorentzian metric (or a non-degenerate metric of any signature), one can transform a $(p, q)$-tensor field into a $(r, s)$-tensor field for any $r, s \geq 0$ with $r + s = p + q$.
Best Answer
The terms "contravariant" and "covariant" refer to how vectors change when you move from one coordinate system to another. So just declaring a vector like $V$ or $[2,2]$ does not make it contra or covariant. It's what you do with it that matters.
For example, let's say there is coordinate system representing the width and length of furniture in meters. A vector $[2,2]$ in this coordinate system could mean two different things:
Firstly, it could be the dimensions of a piece of furniture: 2 meters by 2 meters. That would be a contravariant vector. The reason is that if you transform into a feet coordinate system, the numbers $[2,2]$ would get bigger $[6.28,6.28]$ while the measurements (aka basis vectors) get smaller (feet are smaller than meters). Since the numbers vary contrary to the bases, it's a contravariant vector.
But $[2,2]$ could also represent a function for computing the perimeter of the furniture. If you had a desk that was 1.5m x 3m, you can find the perimeter by multiplying by $[2,2]$. Eg the perimeter is $[2,2] * [1.5,3] = 2*1.5 + 2*3 = 9$. To transform that function into feet (so you get the same answer), the vector $[2,2]$ would have to get smaller $[0.6,0.6]$. Since the numbers co-vary with the bases, it's a covariant vector.
So you have to know what the vector does and how it transforms in order to determine whether it's contravariant or covariant.
For the summing $\frac{\partial x^d}{\partial x'^b}$ is a 4x4 matrix. Multiplying any matrix by its inverse results in the identity matrix, which is the same as the Kronecker Delta $\delta$. Each entry in the result will be a sum with 4 parts, but they'll cancel out to leave either 1 or 0.