Since you asked for an intuitive way to understand covariance and contravariance, I think this will do.
First of all, remember that the reason of having covariant or contravariant tensors is because you want to represent the same thing in a different coordinate system. Such a new representation is achieved by a transformation using a set of partial derivatives. In tensor analysis, a good transformation is one that leaves invariant the quantity you are interested in.
For example, we consider the transformation from one coordinate system $x^1,...,x^{n}$ to another $x^{'1},...,x^{'n}$:
$x^{i}=f^{i}(x^{'1},x^{'2},...,x^{'n})$ where $f^{i}$ are certain functions.
Take a look at a couple of specific quantities. How do we transform coordinates? The answer is:
$dx^{i}=\displaystyle \frac{\partial x^{i}}{\partial x^{'k}}dx^{'k}$
Every quantity which under a transformation of coordinates, transforms like the coordinate differentials is called a contravariant tensor.
How do we transform some scalar $\Phi$?
$\displaystyle \frac{\partial \Phi}{\partial x^{i}}=\frac{\partial \Phi}{\partial x^{'k}}\frac{\partial x^{'k}}{\partial x^{i}}$
Every quantity which under a coordinate transformation, transforms like the derivatives of a scalar is called a covariant tensor.
Accordingly, a reasonable generalization is having a quantity which transforms like the product of the components of two contravariant tensors, that is
$A^{ik}=\displaystyle \frac{\partial x^{i}}{\partial x^{'l}}\frac{\partial x^{k}}{\partial x^{'m}}A^{'lm}$
which is called a contravariant tensor of rank two. The same applies to covariant tensors of rank n or mixed tensor of rank n.
Having in mind the analogy to coordinate differentials and derivative of a scalar, take a look at this picture, which I think will help to make it clearer:
From Wikipedia:
The contravariant components of a vector are obtained by projecting onto the coordinate axes. The covariant components are obtained by projecting onto the normal lines to the coordinate hyperplanes.
Finally, you may want to read: Basis vectors
By the way, I don't recommend to rely blindly on the picture given by matrices, specially when you are doing calculations.
I don't think that "contravariant transformation" is established terminology in physics.
The problem with "covariant" is that in physics, this has a wide range of meanings, starting with "involving no unatural choices" up to the definition one sees in differential geometry motivated by general relativity, which is:
For a smooth real Riemann manifold $M$, a tensor $T$ of rank $\frac{n}{m}$ is a linear function which takes n 1-forms and m tangent vectors as input. When you choose a coordinate chart and dual bases on the cotangential space $d x_n$ and on the tangential space $\partial_n$ with respect to this chart, then the tensor has coordinate functions of the form
$$
T^{\alpha, \beta, ...}_{\gamma, \delta,...} = T(d x_{\alpha}, d x_{\beta}..., \partial_{\gamma}, \partial_{\delta}...)
$$
With respect to these bases, a downstairs index is called covariant, an upstairs index is called contravariant. Now, a "covariant equation" or "covariant operation" is one that does not change its form on a coordinate change, which means that if you change coordinates and apply the choordinate change to all covariant and contravariant indices of every tensor in your equation, then you have to get the same equation, but with "indices with respect to the new coordinates".
A simple example would be:
$$
T^{\alpha}_{\alpha} = 0
$$
with the Einstein summation convention: When the same index is used for a covariant and a contravariant index, it is understood that one should sum over all indices of a pair of dual bases.
Physicists would say that this equation is "covariant" because it has the same form in every coordinate chart, i.e. when I apply a diffeomorphism I get
$$
T^{\alpha'}_{\alpha'} = 0
$$
with respect to the new coordinates. Note that since we talk about general relativity, the kind of transformations are implicitly fixed to be changes of charts on a smooth real manifold. As I said before, when physicists talk about different theories, they may implicitly talk about other kinds of transformations. (Maybe you ran into some physicists who said "covariant transformation" when they meant "coordinate change", but me personally, I have not encountered this use of language.)
Best Answer
Usually, a matrix is thought of a representation of a linear operator: a map that takes a vector and spits out another vector. Say $A$ is some linear operator and $v$ is some vector, then $A(v)$ is the output vector.
An equivalent way of looking at it, however, is to say that there is a map $B$ that takes two vectors $v, w$ and spits out a scalar, given by $B(v,w) = A(v) \cdot w$, say. Such a map is what is usually described in the literature when talking about tensors.
Where do contravariance and covariance come in? Well, the above idea of a tensor is actually a bit of a cheat. There might not be an inner product; we might not be able to freely convert between vectors and covectors using it. So instead of saying that $B$ takes two vectors as arguments, let $B$ be a map taking one vector $v$ and a covector $\alpha$ instead, so that $B(v, \alpha) = \alpha(A(v))$.
(You'll note that, if there is a way to convert from vectors to covectors, then any tensor acting on $p$ vectors and $q$ covectors could be converted to one that acts on $p+q$ vectors, for instance.)
A general tensor could take any number of vector or covector arguments, or a mix of the two in any number.
In physics, it's common to look at the components of a tensor with respect to some basis--rather than supply whatever vectors or covectors that might be relevant to a problem, we supply a set of basis vectors and covectors instead, so we need only remember the coefficients. If $e_i$ is the $i$th basis vector and $e^j$ is the $j$th basis covector, then $B(e_i, e^j) = {B_i}^j$ takes us from the more math-inclined definition of a tensor to the more familiar form to a physicist.