Since you asked for an intuitive way to understand covariance and contravariance, I think this will do.
First of all, remember that the reason of having covariant or contravariant tensors is because you want to represent the same thing in a different coordinate system. Such a new representation is achieved by a transformation using a set of partial derivatives. In tensor analysis, a good transformation is one that leaves invariant the quantity you are interested in.
For example, we consider the transformation from one coordinate system $x^1,...,x^{n}$ to another $x^{'1},...,x^{'n}$:
$x^{i}=f^{i}(x^{'1},x^{'2},...,x^{'n})$ where $f^{i}$ are certain functions.
Take a look at a couple of specific quantities. How do we transform coordinates? The answer is:
$dx^{i}=\displaystyle \frac{\partial x^{i}}{\partial x^{'k}}dx^{'k}$
Every quantity which under a transformation of coordinates, transforms like the coordinate differentials is called a contravariant tensor.
How do we transform some scalar $\Phi$?
$\displaystyle \frac{\partial \Phi}{\partial x^{i}}=\frac{\partial \Phi}{\partial x^{'k}}\frac{\partial x^{'k}}{\partial x^{i}}$
Every quantity which under a coordinate transformation, transforms like the derivatives of a scalar is called a covariant tensor.
Accordingly, a reasonable generalization is having a quantity which transforms like the product of the components of two contravariant tensors, that is
$A^{ik}=\displaystyle \frac{\partial x^{i}}{\partial x^{'l}}\frac{\partial x^{k}}{\partial x^{'m}}A^{'lm}$
which is called a contravariant tensor of rank two. The same applies to covariant tensors of rank n or mixed tensor of rank n.
Having in mind the analogy to coordinate differentials and derivative of a scalar, take a look at this picture, which I think will help to make it clearer:
From Wikipedia:
The contravariant components of a vector are obtained by projecting onto the coordinate axes. The covariant components are obtained by projecting onto the normal lines to the coordinate hyperplanes.
Finally, you may want to read: Basis vectors
By the way, I don't recommend to rely blindly on the picture given by matrices, specially when you are doing calculations.
Well, I do not know to what extend you know about elementary row operations and what they really are. If you do an elementary row operation with a matrix you multiply that matrix with an invertible square matrix out of the set $\text{GL}_n(F)$ where "GL" stands for general linear group, $n$ for the size of the matrices in that set and $F$ the field the matrices in that set live in. All the matrices in that set are invertible and all matrices that are invertible (with size $n$ and over the field $F$) are in that set.
The problem is that the multiplication with these matrices give you all the elemantary row operations, i.e. adding the multiple of a row onto another row. And we do not want that. So, what you are looking for is the multiplication with the matrix $I^{i,j}=I_n-I_{i,i}-I_{j,j}+I_{i,j}+I_{j,i}$ where $I_n$ stands for the identity matrix with size $n$ and $I_{k,l}$ for the matrix (with size $n$) where alle the entries are $0$ except for the entry in row $k$ and column $l$. So, the matrix $I^{i,j}$ has is the identity matrix, but in the the entry $(i,i)$, which is on the diagonal on the $i$th row you have a $0$. In the same row however you have in the $j$th column a $1$. And on the $j$th row on the diagonal (entry $(j,j)$) again you have a $0$ and on the same row in the column $i$ you have a one. Here an example for $I^{2,4}$ for a matrix with size 5:
\begin{pmatrix}
1 & 0 & 0 & 0 & 0\\
0 & 0 & 0 & 1 & 0\\
0 & 0 & 1 & 0 & 0\\
0 & 1 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 1
\end{pmatrix}
If you multiply any matrix with $I^{i,j}$ it will swap the columns $i$ and $j$ in the result. So, with another example the following will give us:
$$
\begin{pmatrix}
1 & 2 & 3 & 4 & 5\\
6 & 7 & 8 & 9 & 10\\
11 & 12 & 13 & 14 & 15\\
16 & 17 & 18 & 19 & 20
\end{pmatrix} \cdot
\begin{pmatrix}
1 & 0 & 0 & 0 & 0\\
0 & 0 & 0 & 1 & 0\\
0 & 0 & 1 & 0 & 0\\
0 & 1 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 1
\end{pmatrix} =
\begin{pmatrix}
1 & 4 & 3 & 2 & 5\\
6 & 9 & 8 & 7 & 10\\
11 & 14 & 13 & 12 & 15\\
16 & 19 & 18 & 17 & 20
\end{pmatrix}
$$
However, you would also like to have the row swaps. Well, then you have to look at the transposed matrix which basically takes your columns and turn them into rows and the other way around. It is like mirroring the matrix at the diagonal. Now you can also multiply with $I^{i,j}$ in order to change the columns of the transposed matrix. Then you have to transpose it back and have changed rows. I am not a hundred percent sure how to give a proper notation. But it might look something like this: Let $M$ be the set of all matrices that with swapped rows or columns of $A \in F^{n \times m}$. Then:
$$M=\{ A \cdot I^{i,j} | i \in {1,...,n} \text{ and } j \in {1,...,m} \text{ and } I^{i,j} \in F^{m \times m} \text{ and defined as explained above} \} \cup \{ (A^T \cdot I^{k,l})^T | k \in {1,...,m} \text{ and } l \in {1,...,n} \text{ and } I^{k,l} \in F^{n \times n} \text{ and defined as explained above} \} $$
I hope I answered your question.
EDIT:
$(A^T \cdot I^{k,l})^T=(I^{k,l})^T \cdot A =I^{k,l} \cdot A $ That is a much better notation and simpler. And also, with the set from above you only either have the row or column swaps, but for every combination and multiple of swaps you would have the following set:
$$M=\{\prod_{k=1}^n (\prod_{l=1}^n I^{k,l}) \cdot A \cdot \prod_{i=1}^m (\prod_{j=1}^m I^{i,j}) | I^{k,l} \in F^{n \times n} \text{ and } I^{i,j} \in F^{m \times m}\} $$
Although, again I am not quite sure if I got every possible combination.
Best Answer
You should ask yourself: what does it mean to write $ds^2=dx^2+dy^2+dz^2$ in the first place? The symbol $dx^2$ denotes $dx\otimes dx$, namely the tensor product of the form $dx$ with itself. However, no one can stop you from taking the tensor product $dx\otimes dy$, for instance. Using the notations $dx,dy,dz$ is customarily reserved for the case of forms on a $3-$dimensional space with coordinates $x,y,z$. If we pass to an $n-$dimensional space with coordinates $x^1,\ldots, x^n$, then our associated forms are $dx^1,\ldots, dx^n$.
$(1)$ As for the first question, you should view $dx^idx^j(=dx^i\otimes dx^j)$ as being a bilinear operator $\mathbb{R}^n\times \mathbb{R}^n\to \mathbb{R}$ operating by $(v,w)\mapsto dx^i(v)\cdot dx^j(w).$ The pair of upper indices should signal to you that this is a "$2-$form" in that it eats two vectors and spits out a scalar number.
$(2)$ We write $\delta_{ij}$ because when we write $ds^2=\delta_{ij}dx^idx^j$ we secretly mean $$ ds^2=\delta_{ij}dx^idx^j=\sum_{i,j} \delta_{ij}dx^idx^j$$ and the convention is that in a sum like this the coefficients of the forms get lower indices and the forms get upper indices. (This is called the Einstein summation convention.)