Usually, a matrix is thought of a representation of a linear operator: a map that takes a vector and spits out another vector. Say $A$ is some linear operator and $v$ is some vector, then $A(v)$ is the output vector.
An equivalent way of looking at it, however, is to say that there is a map $B$ that takes two vectors $v, w$ and spits out a scalar, given by $B(v,w) = A(v) \cdot w$, say. Such a map is what is usually described in the literature when talking about tensors.
Where do contravariance and covariance come in? Well, the above idea of a tensor is actually a bit of a cheat. There might not be an inner product; we might not be able to freely convert between vectors and covectors using it. So instead of saying that $B$ takes two vectors as arguments, let $B$ be a map taking one vector $v$ and a covector $\alpha$ instead, so that $B(v, \alpha) = \alpha(A(v))$.
(You'll note that, if there is a way to convert from vectors to covectors, then any tensor acting on $p$ vectors and $q$ covectors could be converted to one that acts on $p+q$ vectors, for instance.)
A general tensor could take any number of vector or covector arguments, or a mix of the two in any number.
In physics, it's common to look at the components of a tensor with respect to some basis--rather than supply whatever vectors or covectors that might be relevant to a problem, we supply a set of basis vectors and covectors instead, so we need only remember the coefficients. If $e_i$ is the $i$th basis vector and $e^j$ is the $j$th basis covector, then $B(e_i, e^j) = {B_i}^j$ takes us from the more math-inclined definition of a tensor to the more familiar form to a physicist.
I think putting tensors in the right context will clear much of this up. I'll stick with $\mathbb R^3$ since that's the example you use. Rank $2$ tensors are elements of the so-called tensor product of $\mathbb R^3$ with itself, which is denoted by $\mathbb R^3 \otimes \mathbb R^3$. This space consists of all linear combinations of expressions of the form $u \otimes v$ under the stipulations that:
$$u\otimes(v + w) = u\otimes v + u\otimes w,$$
$$(u+v)\otimes w = u\otimes w + v\otimes w, \text{and}$$
$$u\otimes (cv) = c(u \otimes v) = (cu) \otimes v$$
where $u,v,w$ are vectors and $c,d$ are scalars.
Taking the standard basis $e_1,e_2,e_3$ of $\mathbb R^3$, any rank $2$ tensor can then be written as a linear combination of the $9$ "pure" tensors $e_i \otimes e_j$ for $i,j = 1,2,3$. The $9$ scalars you take as coefficients in such a linear combination make up the $3 \times 3$ matrix which "represents" that rank $2$ tensor. A rank $3$ tensor, an element of the tensor product $\mathbb R^3 \otimes \mathbb R^3 \otimes \mathbb R^3$, would then consists of linear combinations of the $27$ pure tensors:
$$e_i \otimes e_j \otimes e_k$$
where $i,j,k=1,2,3$. The $27$ coefficients in such a linear combination make up the $3 \times 3 \times 3$ array you mention.
Tensor multiplication is then just given by the good ol' distributive property. For instance, the product of the rank $1$ tensor $2e_1+ 3e_2$ and the rank $2$ tensor $-2(e_1 \otimes e_2) + 2(e_2 \otimes e_3)$ is:
$$[2e_1 + 3e_2] \otimes [-2(e_1 \otimes e_2) + 2(e_2 \otimes e_3)]$$
$$-4(e_1 \otimes e_1 \otimes e_2)+4(e_1\otimes e_2 \otimes e_2)-6(e_2 \otimes e_1\otimes e_2)+6(e_2 \otimes e_2\otimes e_3).$$
For two rank $1$ tensors
$$ae_1+be_2+ce_3 \text{ and } xe_1+ye_2+ze_3,$$
tensor multiplication gives a rank $2$ tensor whose coefficient matrix (i.e. the matrix whose entries are the coefficients of the $e_i \otimes e_j$ terms) is the product of the matrices
$$\begin{pmatrix}a\\b\\c\end{pmatrix} \text{ and } \begin{pmatrix}x&y&z\end{pmatrix},$$
as you alluded to in your question. However, in general there is no simple relation between tensor multiplication and matrix multiplication.
Best Answer
This question doesn't have a single good answer, because there isn't a universally agreed upon definition of "tensor" in mathematics. In particular:
Tensors are sometimes defined as multidimensional arrays, in the same way that a matrix is a two-dimensional array. From this point of view, a matrix is certainly a special case of a tensor.
In differential geometry and physics, "tensor" refers to a certain kind of object that can be described at a point on a manifold (though the word "tensor" is often used to refer to a tensor field, in which one tensor is chosen for every point). From this point of view, a matrix can be used to describe a rank-two tensor in local coordinates, but a rank-two tensor is not itself a matrix.
In linear algebra, "tensor" sometimes refers to an element of a tensor product, and sometimes refers to a certain kind of multilinear map. Again, neither of these is a generalization of "matrix", though you can get a matrix from a rank-two tensor if you choose a basis for your vector space.
You run into the same problem if you ask a question like "Is a vector just a tuple of numbers?" Sometimes a vector is defined as a tuple of numbers, in which case the answer is yes. However, in differential geometry and physics, the word "vector" refers to an element of the tangent space to a manifold, while in linear algebra, a "vector" may be any element of a vector space.
On a basic level, the statement "a vector is a rank 1 tensor, and a matrix is a rank 2 tensor" is roughly correct. This is certainly the simplest way of thinking about tensors, and is reflected in the Einstein notation. However, it is important to appreciate the subtleties of this identification, and to realize that "tensor" often means something slightly different and more abstract than a multidimensional array.