Usually, a matrix is thought of a representation of a linear operator: a map that takes a vector and spits out another vector. Say $A$ is some linear operator and $v$ is some vector, then $A(v)$ is the output vector.
An equivalent way of looking at it, however, is to say that there is a map $B$ that takes two vectors $v, w$ and spits out a scalar, given by $B(v,w) = A(v) \cdot w$, say. Such a map is what is usually described in the literature when talking about tensors.
Where do contravariance and covariance come in? Well, the above idea of a tensor is actually a bit of a cheat. There might not be an inner product; we might not be able to freely convert between vectors and covectors using it. So instead of saying that $B$ takes two vectors as arguments, let $B$ be a map taking one vector $v$ and a covector $\alpha$ instead, so that $B(v, \alpha) = \alpha(A(v))$.
(You'll note that, if there is a way to convert from vectors to covectors, then any tensor acting on $p$ vectors and $q$ covectors could be converted to one that acts on $p+q$ vectors, for instance.)
A general tensor could take any number of vector or covector arguments, or a mix of the two in any number.
In physics, it's common to look at the components of a tensor with respect to some basis--rather than supply whatever vectors or covectors that might be relevant to a problem, we supply a set of basis vectors and covectors instead, so we need only remember the coefficients. If $e_i$ is the $i$th basis vector and $e^j$ is the $j$th basis covector, then $B(e_i, e^j) = {B_i}^j$ takes us from the more math-inclined definition of a tensor to the more familiar form to a physicist.
Although its a bit lengthy subject but i'll try to give you the exact mathematical information in as short an introduction as possible. Let's start directly with ....
DEFINITION OF TENSOR --- A ($p$,$q$)-tensor $T$ (with p and q integers) is a multilinear transformation
$$T:\underbrace{V^*\times V^*\times\dots V^*}_{p\text{ times}}\times\underbrace{V\times V\times\dots V}_{q\text{ times}}\to\mathbb R$$ where $V$ is a vector space, $V^*$ is its dual vector space and $\mathbb R$ is the set of real numbers. The integer $p+q$ is the rank of the tensor.
Example, A (1,1) tensor is a multilinear transformation, $T:V^*\times V\to \mathbb R$. Using the same information we can construct an object $T:V\to V$ as shown later. We recognise this as a simple linear trasnformation of vectors, represented by a matrix. Hence a matrix is a (1,1) tensor.
What does that mean? It means that a Tensor takes p covectors and q vectors and converts them multilinearly to a real number. The main thing to understand here is the difference between a vector (member of a vector space) and a covector (member of the dual vector space). If you already know about this, you can skip this section. A vector is defined as a member of a vector space which itself is defined as a set with a addition and scalar multiplication following certain axioms.* A covector is defined as follows:
Definition (Dual space) The set of all linear transformations
$\boldsymbol\omega:V\to\mathbb R$ is called the dual vector space and denoted by $V^*$. The members of the dual vector space are called covectors.
Theorem (without proof) The dual of a dual space of a finite dimesnional vector space $V$ is $V$. i.e., $$(V^*)^*=V$$
We usually denote vectors by $\boldsymbol v$ and covectors by $\boldsymbol \omega$. Also by convention, vectors have indices up and covectors have indices down. (The indices representing coordinates)
$$\boldsymbol{v}=v^i\boldsymbol e_i, \quad\boldsymbol\omega=\omega_i \boldsymbol \epsilon^i$$
Here $e^i$ are the basis vectors. Whenever you see an index up and the same index down you have to sum over that index, like in the above equations, ($\boldsymbol v = \sum_i v^i \boldsymbol e_i$).
Notice that a covector is a (0,1) tensor and a real number is a (0,0) tensor. This can bee seen readily from the definition. We can show that a vector is a (1,0) tensor, using the above mentioned theorem, although it is not very obvious.
How to represent tensors in a basis? Let's say we want to represent a (1,2)-tensor in a given basis. We apply it to an arbitrary input:
$$T(\boldsymbol \omega, \boldsymbol v, \boldsymbol w)=T(\omega_a \boldsymbol\epsilon^a,v^b\boldsymbol e_b,w^c\boldsymbol e_c)=\omega_a v^b w^c T(\boldsymbol\epsilon^a,\boldsymbol e_b,\boldsymbol e_c)$$
Here the objects $T(\boldsymbol\epsilon^a,\boldsymbol e_b,\boldsymbol e_c)$ are simply real numbers and they can be labelled as $T^a_{bc}$. Hence a tensor can be represented by a set of $(\dim V)^{p+q}$ numbers. A tensor T of type (p,q) and rank (p+q) has p indices up and q indices down.
Theorem In the definition mentioned above we can transfer $V$ or $V^*$ to the other side by removing or adding a $*$ to V.
Consider a (1,1) tensor. It is an object $T^a_b$ which takes a vector and a covector and converts it to a real number, like so: $$T:V^*\times V\to\mathbb R$$
$$T^a_b \omega_a v^b = r, \,\, r\in\mathbb R.$$ However the same object can be used like so:
$$T^a_bv^b = w^a,$$ here it has converted a vector to another, $$T:V\to V.$$
A matrix can do the same things, just think row vector = covector, column vector = vector, and matrix (NxN) = Tensor. Then: covector * Matrix * vector = real number, while Matrix * vector = vector. The entries of the matrix are precisely the numbers $T^a_b$.
Hence, a matrix is simply a (1,1) tensor. However the notation of matrices requires us to use it in some particular ways, like you can do covector * Matrix * vector but not Matrix * vector * covector.
Footnotes:
*the axioms are CANI ADDU - For addition: Commutativity, Associativity, Neutral element (0 vector) exists, Inverse elements exist. For scalar multiplication and addition: Associativity, two Distributivities, Unit element (1*v=v)
Please note this is only a mathematical introduction to this subject. I have not answered all your questions but this is an attempt to make things precise so you dont learn something wrong and probably you will now be able to answer thos questions yourself.
Best Answer
The word tensor is often abused. Firstly, a tensor is simply an element of the tensor product of some vector spaces or bimodules or something. In this sense, of course there are non-square tensors. For example an element of $V\otimes_k W$ would be called a tensor, for any $k$-vector spaces $V$ and $W$. But the words covariant and contravariant don't have any meaning here.
Secondly (and this is more closely aligned with the topic of your question), tensor might also mean a tensor (in the first sense above) valued function on a manifold. For example, let $T_p M$ denote the tangent space to a smooth manifold at the point $p\in M$. A tensor can mean a choice of element $Z_p\in T_pM\otimes \cdots \otimes T_pM \otimes (T_pM)^\ast\otimes \cdots\otimes (T_pM)^\ast$ for each point $p\in M$, which depends differentiably on $p$. For example vector fields are tensors in this sense. The words covariant and contravariant have their origins here in how the coordinates of $Z$ behave with respect to coordinate transformations on $M$.
For a "non-square" tensor of this type, one especially important example is the second fundamental form. If $M^k$ is a Riemannian manifold isometrically immeresed in some Riemannian manifold $N^{k+n}$, then the second fundamental form is roughly this: for a point $p\in M$ and a pair of tangent vectors $v,w\in T_pM\subset T_pN$, there is a normal vector $S_p(v,w)\in (T_pM)^\perp$ which is something like a second derivative (hence measures curvature). Since $S_p$ chews on two tangent vectors and spits out a normal vector, we can think of $S_p$ as an element of $(T_pM)^\perp\otimes (T_pM)^\ast\otimes (T_pM)^\ast$. This $S$ is a very important non-square tensor (dimensions $n\times k\times k$)!
For a more precise response to your questions:
Not really. Tensors don't really act on anything. However $\operatorname{End}(V)\cong V\otimes V^\ast$, so operators can be thought of tensors, but not usually vice versa. A tensor usually just means an element of a tensor product of vector spaces (mathematician) or as a tensor valued function (physicist).
I would say this is right. Without any context there's no reason to call an element of $V\otimes W^\ast$ a tensor of type $(1,1)$ or $(2,0)$, or whatever. These notions are undefined in general.
No.
Yes! See above.