Since you asked for an intuitive way to understand covariance and contravariance, I think this will do.
First of all, remember that the reason of having covariant or contravariant tensors is because you want to represent the same thing in a different coordinate system. Such a new representation is achieved by a transformation using a set of partial derivatives. In tensor analysis, a good transformation is one that leaves invariant the quantity you are interested in.
For example, we consider the transformation from one coordinate system $x^1,...,x^{n}$ to another $x^{'1},...,x^{'n}$:
$x^{i}=f^{i}(x^{'1},x^{'2},...,x^{'n})$ where $f^{i}$ are certain functions.
Take a look at a couple of specific quantities. How do we transform coordinates? The answer is:
$dx^{i}=\displaystyle \frac{\partial x^{i}}{\partial x^{'k}}dx^{'k}$
Every quantity which under a transformation of coordinates, transforms like the coordinate differentials is called a contravariant tensor.
How do we transform some scalar $\Phi$?
$\displaystyle \frac{\partial \Phi}{\partial x^{i}}=\frac{\partial \Phi}{\partial x^{'k}}\frac{\partial x^{'k}}{\partial x^{i}}$
Every quantity which under a coordinate transformation, transforms like the derivatives of a scalar is called a covariant tensor.
Accordingly, a reasonable generalization is having a quantity which transforms like the product of the components of two contravariant tensors, that is
$A^{ik}=\displaystyle \frac{\partial x^{i}}{\partial x^{'l}}\frac{\partial x^{k}}{\partial x^{'m}}A^{'lm}$
which is called a contravariant tensor of rank two. The same applies to covariant tensors of rank n or mixed tensor of rank n.
Having in mind the analogy to coordinate differentials and derivative of a scalar, take a look at this picture, which I think will help to make it clearer:
From Wikipedia:
The contravariant components of a vector are obtained by projecting onto the coordinate axes. The covariant components are obtained by projecting onto the normal lines to the coordinate hyperplanes.
Finally, you may want to read: Basis vectors
By the way, I don't recommend to rely blindly on the picture given by matrices, specially when you are doing calculations.
I think putting tensors in the right context will clear much of this up. I'll stick with $\mathbb R^3$ since that's the example you use. Rank $2$ tensors are elements of the so-called tensor product of $\mathbb R^3$ with itself, which is denoted by $\mathbb R^3 \otimes \mathbb R^3$. This space consists of all linear combinations of expressions of the form $u \otimes v$ under the stipulations that:
$$u\otimes(v + w) = u\otimes v + u\otimes w,$$
$$(u+v)\otimes w = u\otimes w + v\otimes w, \text{and}$$
$$u\otimes (cv) = c(u \otimes v) = (cu) \otimes v$$
where $u,v,w$ are vectors and $c,d$ are scalars.
Taking the standard basis $e_1,e_2,e_3$ of $\mathbb R^3$, any rank $2$ tensor can then be written as a linear combination of the $9$ "pure" tensors $e_i \otimes e_j$ for $i,j = 1,2,3$. The $9$ scalars you take as coefficients in such a linear combination make up the $3 \times 3$ matrix which "represents" that rank $2$ tensor. A rank $3$ tensor, an element of the tensor product $\mathbb R^3 \otimes \mathbb R^3 \otimes \mathbb R^3$, would then consists of linear combinations of the $27$ pure tensors:
$$e_i \otimes e_j \otimes e_k$$
where $i,j,k=1,2,3$. The $27$ coefficients in such a linear combination make up the $3 \times 3 \times 3$ array you mention.
Tensor multiplication is then just given by the good ol' distributive property. For instance, the product of the rank $1$ tensor $2e_1+ 3e_2$ and the rank $2$ tensor $-2(e_1 \otimes e_2) + 2(e_2 \otimes e_3)$ is:
$$[2e_1 + 3e_2] \otimes [-2(e_1 \otimes e_2) + 2(e_2 \otimes e_3)]$$
$$-4(e_1 \otimes e_1 \otimes e_2)+4(e_1\otimes e_2 \otimes e_2)-6(e_2 \otimes e_1\otimes e_2)+6(e_2 \otimes e_2\otimes e_3).$$
For two rank $1$ tensors
$$ae_1+be_2+ce_3 \text{ and } xe_1+ye_2+ze_3,$$
tensor multiplication gives a rank $2$ tensor whose coefficient matrix (i.e. the matrix whose entries are the coefficients of the $e_i \otimes e_j$ terms) is the product of the matrices
$$\begin{pmatrix}a\\b\\c\end{pmatrix} \text{ and } \begin{pmatrix}x&y&z\end{pmatrix},$$
as you alluded to in your question. However, in general there is no simple relation between tensor multiplication and matrix multiplication.
Best Answer
So I figured it out!
The confusing part is that the dimensions (AKA the rank AKA the number of indices) of a tensor is not the same as the spatial dimensions that might be used to describe it.
A 2D vector can be described using a 1D tensor: (x, y).
A 3D vector can be described using a 1D tensor: (x, y, z).
But a 2D tensor, aka a matrix is more like: ((x, y, z), (x, y, z), (x, y, z)) Or in proper math notation:
$\begin{bmatrix}x & y & z\\x & y & z\\ x & y & z\end{bmatrix}$
the matrix has 2 dimensions (it's a rectangle) but that's not the same thing as the dimensions of space it may or may not describe.
Whew this was good to figure out finally!
Mathematics needs to improve it's terminology to have less words used in multiple ways in different contexts. Maybe that's why tensor rank was adopted I guess.