Although its a bit lengthy subject but i'll try to give you the exact mathematical information in as short an introduction as possible. Let's start directly with ....
DEFINITION OF TENSOR --- A ($p$,$q$)-tensor $T$ (with p and q integers) is a multilinear transformation
$$T:\underbrace{V^*\times V^*\times\dots V^*}_{p\text{ times}}\times\underbrace{V\times V\times\dots V}_{q\text{ times}}\to\mathbb R$$ where $V$ is a vector space, $V^*$ is its dual vector space and $\mathbb R$ is the set of real numbers. The integer $p+q$ is the rank of the tensor.
Example, A (1,1) tensor is a multilinear transformation, $T:V^*\times V\to \mathbb R$. Using the same information we can construct an object $T:V\to V$ as shown later. We recognise this as a simple linear trasnformation of vectors, represented by a matrix. Hence a matrix is a (1,1) tensor.
What does that mean? It means that a Tensor takes p covectors and q vectors and converts them multilinearly to a real number. The main thing to understand here is the difference between a vector (member of a vector space) and a covector (member of the dual vector space). If you already know about this, you can skip this section. A vector is defined as a member of a vector space which itself is defined as a set with a addition and scalar multiplication following certain axioms.* A covector is defined as follows:
Definition (Dual space) The set of all linear transformations
$\boldsymbol\omega:V\to\mathbb R$ is called the dual vector space and denoted by $V^*$. The members of the dual vector space are called covectors.
Theorem (without proof) The dual of a dual space of a finite dimesnional vector space $V$ is $V$. i.e., $$(V^*)^*=V$$
We usually denote vectors by $\boldsymbol v$ and covectors by $\boldsymbol \omega$. Also by convention, vectors have indices up and covectors have indices down. (The indices representing coordinates)
$$\boldsymbol{v}=v^i\boldsymbol e_i, \quad\boldsymbol\omega=\omega_i \boldsymbol \epsilon^i$$
Here $e^i$ are the basis vectors. Whenever you see an index up and the same index down you have to sum over that index, like in the above equations, ($\boldsymbol v = \sum_i v^i \boldsymbol e_i$).
Notice that a covector is a (0,1) tensor and a real number is a (0,0) tensor. This can bee seen readily from the definition. We can show that a vector is a (1,0) tensor, using the above mentioned theorem, although it is not very obvious.
How to represent tensors in a basis? Let's say we want to represent a (1,2)-tensor in a given basis. We apply it to an arbitrary input:
$$T(\boldsymbol \omega, \boldsymbol v, \boldsymbol w)=T(\omega_a \boldsymbol\epsilon^a,v^b\boldsymbol e_b,w^c\boldsymbol e_c)=\omega_a v^b w^c T(\boldsymbol\epsilon^a,\boldsymbol e_b,\boldsymbol e_c)$$
Here the objects $T(\boldsymbol\epsilon^a,\boldsymbol e_b,\boldsymbol e_c)$ are simply real numbers and they can be labelled as $T^a_{bc}$. Hence a tensor can be represented by a set of $(\dim V)^{p+q}$ numbers. A tensor T of type (p,q) and rank (p+q) has p indices up and q indices down.
Theorem In the definition mentioned above we can transfer $V$ or $V^*$ to the other side by removing or adding a $*$ to V.
Consider a (1,1) tensor. It is an object $T^a_b$ which takes a vector and a covector and converts it to a real number, like so: $$T:V^*\times V\to\mathbb R$$
$$T^a_b \omega_a v^b = r, \,\, r\in\mathbb R.$$ However the same object can be used like so:
$$T^a_bv^b = w^a,$$ here it has converted a vector to another, $$T:V\to V.$$
A matrix can do the same things, just think row vector = covector, column vector = vector, and matrix (NxN) = Tensor. Then: covector * Matrix * vector = real number, while Matrix * vector = vector. The entries of the matrix are precisely the numbers $T^a_b$.
Hence, a matrix is simply a (1,1) tensor. However the notation of matrices requires us to use it in some particular ways, like you can do covector * Matrix * vector but not Matrix * vector * covector.
Footnotes:
*the axioms are CANI ADDU - For addition: Commutativity, Associativity, Neutral element (0 vector) exists, Inverse elements exist. For scalar multiplication and addition: Associativity, two Distributivities, Unit element (1*v=v)
Please note this is only a mathematical introduction to this subject. I have not answered all your questions but this is an attempt to make things precise so you dont learn something wrong and probably you will now be able to answer thos questions yourself.
Regarding the Background:
The first two bullets are fine. The "copy paste" metaphor is interesting.
Bullet 3: I'm not quite sure what you're getting at with this excerpt
Any number of different matrices could represent the same vector, for instance the same 1 by 3 column vector could be represented as a 2 by 2 matrix with one slot being 0 and the elements switched around. However, you usually choose the representation that makes computations appear as they would for basic linear algebra with geometric vectors.
but of course the gist, i.e. that row/column vectors and matrices can be used to flexibly represent vector spaces, is correct.
Bullet 4: Really not sure what you're trying to get here. I'm not sure how to interpret the sentence "multiplication of b by a can loosely be interpreted as a as a function of b".
Bullet 5: Not sure what this is supposed to say. I think you're just explaining the notation $f:X \to Y$, but your wording and choices of notation are awkward.
Bullet 6: Mostly correct, but subtly wrong. The phrasing of "vector spaces imply the existence of elements of a field with it" bothers me. It's not clear what you mean by "these maps must exist in order for it to have been called a vector space". In the end, it seems like you're trying to say something like "The dual space $V^*$ of $V$ is the set of linear maps from $V$ to its underlying field. The elements of a dual space are called covectors". I think you have the right idea, it's just not very readable right now.
Regarding Tensors:
Bullet 1: I haven't heard "tense" used as a verb in this sense. The sentence Additionally, since the linear transformations from the tensor on each vector can be encoded a vector, tensors should also be able to be vectors, which means they have to be able to be part of a vector space is unclear.
Bullet 2: "A multilinear transformation contains multiple sets of linear transformation information, each of which can be considered a vector": Not clear what "contain" means here. I'm really not sure what exactly you're trying to convey in the rest of this paragraph.
Bullet 3: A tensor is defined as an element of the tensor product of any number of vector spaces. Otherwise fine.
Your last two bullets are fine.
What makes all of this really confusing is that in some contexts, it is convenient to think of tensors as multilinear maps, while in other contexts it is convenient to think of tensors as being elements of the fancy vector space that we call the "tensor product" of the input spaces. It is common in the context in the exposition of the relevant fields to completely ignore the alternate points of view.
I have found that in differential geometry, the multilinear map point of view is more common. I think that the "multidimensional array" point of view is most directly connected to this multilinear map definition of a tensor product.
The advantage of the more abstract definition via tensor products of spaces is that all of the maps that we care about are simply linear maps (or in the greater algebraic context, module homomorphisms).
Best Answer
I will try to do things as concretely as possible by working in $\mathbb{R}^n$.
As you've mentioned, matrices represent linear maps. Tensors represent multi-linear maps.
Let's start with vectors. There are broadly speaking two "kinds" of vectors. We have column vectors, which are just the usual things we call vectors, and we also have row vectors. You may not see the point of distinguishing the two, but it is actually important. To distinguish the two, I will call column vectors simply as vectors whereas row vectors will be termed covectors. To distinguish the two, I will use boldface notation like $\mathbf{v}$ for vectors, with an added tilde $\mathbf{\tilde{v}}$ for covectors. To distinguish the spaces that they live in, I will say that vectors are elements of $\mathbb{R}^n$ whereas covectors are elements of $\mathbb{R}^{n*}$.
Suppose that we have a vector $\mathbf{v}$ and a covector $\mathbf{\tilde{u}}$. Then we may multiply the two get $$\mathbf{\tilde{u}}\mathbf{v} = \begin{pmatrix}u_1 & \cdots & u_n\end{pmatrix}\begin{pmatrix}v_1 \\ \vdots \\ v_n\end{pmatrix}=u_1v_1 + \cdots + u_nv_n.$$ Notice that the result is just a scalar number (in fact, it is a dot product). Thus, $\mathbf{\tilde{u}}$ is naturally an object that takes a vector and returns a number. The object $\mathbf{\tilde{u}}$ is a linear map that maps vectors to numbers, i.e., $\mathbf{\tilde{u}}:\mathbb{R}^n \rightarrow \mathbb{R}$.
We can go one step further. In the above product, $\mathbf{\tilde{u}}$ naturally acts on $\mathbf{v}$ from the left to give a number. But we can also turn this around to say that $\mathbf{v}$ acts on $\mathbf{\tilde{u}}$ from the right to give a number. Thus we can also regard $\mathbf{v}$ as a map that takes covectors to numbers, i.e., $\mathbf{v}:\mathbb{R}^{n*}\rightarrow \mathbb{R}$. This restores the symmetry between the two.
The take away message so far is that vectors are objects that take covectors to numbers whereas covectors are objects that take vectors to numbers.
Now, what do matrices do? A matrix $A$ can be viewed in three distinct ways:
i) It takes a vector $\mathbf{v}$ to the vector $A\mathbf{v}$, i.e., $A:\mathbb{R}^n \rightarrow \mathbb{R}^n$.
ii) It takes the covector $\mathbf{\tilde{u}}$ to the covector $\mathbf{\tilde{u}}A$, i.e., $A:\mathbb{R}^{n*} \rightarrow \mathbb{R}^{n*}$
iii) Most importantly for our purposes however, is the fact that it takes a vector $\mathbf{v}$ and a covector $\mathbf{\tilde{u}}$ and gives back a number $\mathbf{\tilde{u}}A\mathbf{v}$. Thus we can view matrices as objects that take both a vector and a covector and gives back a number, i.e., $A:\mathbb{R}^{n*}\times \mathbb{R}^n \rightarrow \mathbb{R}$.
Notice that the first two properties can be seen as consequences of the last. If we give the matrix a vector, then we are still missing a covector in order to get a number. Thus when a matrix gets a vector, it becomes an object that maps covectors to numbers. These are precisely vectors as we've previously shown. Likewise, if we give the matrix a covector, then we still need a vector to get a number. Thus a matrix maps covectors to things that map vectors to numbers, i.e., covectors.
What we've shown is that the three ways of looking at matrices are equivalent. Symbolically, we can write $$A:\mathbb{R}^n \rightarrow \mathbb{R}^n \cong A:\mathbb{R}^{n*} \rightarrow \mathbb{R}^{n*} \cong A:\mathbb{R}^{n*}\times \mathbb{R}^n \rightarrow \mathbb{R}.\tag{1}$$ A simple way to remember these equivalences is to remember that we can move $\mathbb{R}^{n}$/$\mathbb{R}^{n*}$ from the left of the arrow to the right of the arrow along with adding/removing the $*$.
So far this seems like a lot of pointless linear algebraic gymnastics, but in fact we are almost there. Matrices take $\mathrm{(covector,vector)}$ pairs to numbers, but there is another very natural object that does the exact same thing, namely the $\mathrm{(vector,covector)}$ pairs. Given a $\mathrm{(vector,covector)}$ pair $(\mathbf{v},\mathbf{\tilde{u}})$, we can have it act on a $\mathrm{(covector,vector)}$ pair $(\mathbf{\tilde{x}},\mathbf{y})$ as follows: $$(\mathbf{v},\mathbf{\tilde{u}})(\mathbf{\tilde{x}},\mathbf{y}) = (\mathbf{\tilde{x}}\mathbf{v})(\mathbf{\tilde{u}}\mathbf{y}).$$ Notice that we are simply using the fact that the vector $\mathbf{v}$ takes the covector $\mathbf{\tilde{x}}$ to the number $\mathbf{\tilde{x}}\mathbf{v}$ while the covector $\mathbf{\tilde{u}}$ takes the vector $\mathbf{y}$ to the number $\mathbf{\tilde{u}}\mathbf{y}$.
The similarity between $\mathrm{(vector,covector)}$ pairs and matrices is more than skin-deep. Every matrix map can be expressed as a linear combination of these $\mathrm{(vector,covector)}$ pairs. In fact, it is rather trivial to do. For example a matrix $A$ with entries $A_{ij}$ can be written as $$A = \sum_{i,j=1}^nA_{ij}\mathbf{e}_i\mathbf{\tilde{e}}_{j},$$ so the map $A:\mathbb{R}^{n*}\times \mathbb{R}^n \rightarrow \mathbb{R}$ can be equivalently represented as $$A\cong \sum_{i,j=1}^n A_{ij}(\mathbf{e}_{i},\mathbf{\tilde{e}}_j).$$
In the present context, we typically don't use brackets $(\ ,\ )$ to denote $\mathrm{(vector,covector)}$ pairs. Instead, we use these neat little circles $\otimes$ instead: $$(\mathbf{v},\mathbf{\tilde{u}}) := \mathbf{v}\otimes \mathbf{\tilde{u}}.$$
The pairs of vector/covectors are the objects that we call tensors. They live inside the tensor space $\mathbb{R}^n \otimes \mathbb{R}^{n*}$. Specifically, these are $(1,1)$ tensors, tensors comprised of a vector and a covector. What we've also shown is that $$\mathbb{R}^{n,n} \cong \mathbb{R}^n \otimes \mathbb{R}^{n*},$$ that is, the vector space of $n\times n$ matrices $\mathbb{R}^{n,n}$ is isomorphic to the $(1,1)$ tensor space $\mathbb{R}^n \otimes \mathbb{R}^{n*}$.
The generalization is now clear. We can consider objects comprised of $r$ vectors and $s$ covectors: $$\mathbf{v}_1 \otimes \cdots \otimes \mathbf {v}_r \otimes \mathbf{\tilde{u}}_1 \otimes \cdots\otimes \mathbf{\tilde{u}}_s.\tag{2}$$ These are $(r,s)$ tensors, multi-linear maps that take $r$ covectors and $s$ vectors, returning a number. In explicit, gory detail: $$\begin{eqnarray}(\mathbf{v}_1 \otimes \cdots \otimes \mathbf{v}_r \otimes \mathbf{\tilde{u}}_1 \otimes \cdots \otimes \mathbf{\tilde{u}}_s)(\mathbf{\tilde{x}}_1,\cdots,\mathbf{\tilde{x}}_r,\mathbf{y}_1,\cdots,\mathbf{y}_s)\\ =(\mathbf{\tilde{x}}_1\mathbf{v}_1)\cdots(\mathbf{\tilde{x}}_r\mathbf{v}_r)(\mathbf{\tilde{u}}_1\mathbf{y}_1)\cdots(\mathbf{\tilde{u}}_s\mathbf{y}_s)\end{eqnarray}.$$
That's it. Tensors are nothing more than multi-linear maps $$\underbrace{\mathbb{R}^{n*} \times \cdots \times \mathbb{R}^{n*}}_{r\ \text{copies}} \times \underbrace{\mathbb{R}^n \times \cdots \times \mathbb{R}^n}_{s\ \text{copies}} \rightarrow \mathbb{R}. \tag{3}$$ You can convince yourself that any multi-linear map of this kind can be written as a linear combination of primitive tensors of the form given in $(2)$, i.e., we have the vector space isomorphism $$\underbrace{\mathbb{R}^n \otimes \cdots \otimes \mathbb{R}^n}_{r\ \text{copies}} \otimes \underbrace{\mathbb{R}^{n*} \otimes \cdots \otimes \mathbb{R}^{n*}}_{s\ \text{copies}} \cong \{\text{multi-linear maps of the form }(3)\}.$$
Of course in practice, tensors are very versatile. Using the same kind of reasoning that gave us equality $(1)$, we can move $\mathbb{R}^n$/$\mathbb{R}^{n*}$ from the left of the arrow to the right. Thus we can have identities of the form $$\begin{eqnarray} \underbrace{\mathbb{R}^{n*} \times \cdots \times \mathbb{R}^{n*}}_{r\ \text{copies}} \times \underbrace{\mathbb{R}^n \times \cdots \times \mathbb{R}^n}_{s\ \text{copies}} \rightarrow \mathbb{R} \\ \cong \underbrace{\mathbb{R}^{n*} \times \cdots \times \mathbb{R}^{n*}}_{r-k\ \text{copies}} \times \underbrace{\mathbb{R}^n \times \cdots \times \mathbb{R}^n}_{s-\ell\ \text{copies}} \rightarrow \underbrace{\mathbb{R}^n \times \cdots \times \mathbb{R}^n}_{k\ \text{copies}}\times \underbrace{\mathbb{R}^{n*} \times \cdots \times \mathbb{R}^{n*}}_{\ell\ \text{copies}}.\end{eqnarray}$$ Thus we can equivalently regard $(r,s)$ tensors as multi-linear maps that take $r-k$ covectors and $s-\ell$ vectors, in turn giving us $k$ vectors and $\ell$ covectors. This is similar to the corresponding properties that matrices/$(1,1)$-tensors had.
Hopefully you can see now that tensors are a very versatile concept which directly generalize the concept of linear mappings. I shouldn't have to convince you of the utility that such objects have. Ideally, I would close with a practical example, but this answer is already uncomfortably long, so I'll stop here.