There are literally dozens of independent reasons to invent the tensor product, and just about every area of mathematics needs the tensor product for its own reasons (often several reasons). Here are a couple examples.
Suppose $X$ and $Y$ are topological spaces (metric spaces are fine if you like them better) and consider the rings $C(X)$ and $C(Y)$ of continuous real-valued functions. If you are convinced that products are worthy of consideration, then perhaps you are convinced that it is useful to look at $C(X \times Y)$. It is natural to ask if this can be expressed in terms of $C(X)$ and $C(Y)$; the answer (modulo largely irrelevant technical details) is that $C(X \times Y) = C(X) \otimes C(Y)$.
Let $V$ be a vector space over $\mathbb{R}$. It is often desirable to construct a complex vector space naturally associated to $V$ (the "complexificiation" of $V$). Here by "naturally" I mean in a way which is coordinate free and transparently compatible with linear maps. The solution is to set $V_{\mathbb{C}} = V \otimes \mathbb{C}$ (tensor product over $\mathbb{R}$). This is a special case of the more general phenomenon of "extension of scalars". As a fancy example demonstrating that this really is as useful as I claim, you might check out the wikipedia page on "pontryagin classes" (though it might be over your head if you haven't learned much algebraic topology).
One of the reasons why direct sums are important is that they let you turn strange objects into groups. For example, if $G$ is a group and $V$ and $W$ are two representations of $G$ (vector spaces on which $G$ acts nicely), then $V \oplus W$ is also a representation of $G$. So the set of all representations of $G$ has an additive structure, and with a little algebraic magic one can upgrade this structure to a group (don't spend too much time worrying about how you subtract representations). Groups are nice and have lots of their own invariants, but rings are even nicer and have even more invariants. So it would be great if we could define a natural product of representations. You guessed it: the product of $V$ and $W$ is just $V \otimes W$. The set of all representations of $G$ with this structure is the infamous "representation ring" of $G$. This product structure is apparently of paramount importance in quantum mechanics (I don't know why). As another example where the tensor product turns a group into a ring, you might check out the Wikipedia page on "topological K-theory".
There are many more examples. If you know about functional analysis, the Schwartz kernel theorem is a tool used to investigate existence questions and regularity properties of partial differential equations, and it can be formulated purely in terms of Grothendeick's theory of topological tensor products. I can't give you any deep reason why the same algebraic gadget has such a diverse array of applications, but I guess that's the way it is. You'll undoubtedly learn more as you keep studying math.
ADDED:
I just noticed the other part of your question, in which you ask about the "lifting" property of the tensor product. If I were forced to give a one sentence explanation of what the tensor product really is, it would be the following sentence. Given two $R$-modules $A$ and $B$, we want to convert $R$-bilinear maps on $A \times B$ into linear maps on some other object. We want to do this because for many purposes it reduces the structure theory of bilinear maps to the (extensive!) structure theory for linear maps. The lifting property that you describe tells us that the tensor product does the job.
But it more than just "does the job" — it does the job in the absolute best way possible. When you learn about most mathematical objects, such as the direct sum of two vector spaces, it is typical to define the object as some set equipped with some structure and then prove that it has certain nice properties. With the tensor product, you should go about it backwards: you should think of the tensor product as an object with certain nice properties and then prove that there actually is an object with all of those properties. This is because the actual construction of the tensor product of two modules is completely unenlightening and completely irrelevant to how you actually use the idea in practice.
I'll be a little less vague and outline how the tensor product should be developed from scratch. Given two $R$-modules $A$ and $B$, define a tensor product of $A$ and $B$ to be a pair $T, t$ where $T$ is a $R$-module and $t: A \times B \to T$ is a bilinear map with the property that given any bilinear map $Q: A \times B \to C$ there exists a unique linear map $L: T \to C$ such that $Q = L \circ t$.
Lemma 1: If the tensor product exists, it is unique up to unique isomorphism.
Lemma 2: The tensor product exists.
A finitely generated free Abelian group is isomorphic to $\Bbb Z^r$
where the rank $r$ is the size of the generating set. It is clear that $r$
determines the structure of $\Bbb Z^r$.
If $\Bbb Z^r\cong\Bbb Z^s$ then $(\Bbb Z/2\Bbb Z)^r\cong(\Bbb Z/2\Bbb Z)^s$
so $2^r=2^s$ and $r=s$. (For Abelian $G$ and $H$, $G\cong H$ implies $G/2G\cong H/2H$.)
Best Answer
I'll use more convenient notations: in the free abelian group $\mathbf Z^{(X)}$ generated by a set $X$, I'll denote $[x]$ the element $e_x$, i.e. the map which sends $x$ to $1$ and any $x'\ne x$ to $0$.
This being said, you have a bijective map from $\mathbf Z^{(A\times B)}$ to $\mathbf Z^{(B\times A)}$, which sends $[(a,b)]$ to $[(b,a)]$. This map sends generators of the relations defining the tensor product in the first free group, namely $[(a_1,b)]+[(a_2,b)]-[(a_1+a_2,b)]$ onto the generators of the relations defining the tensor product in the second free group, $[(b,a_1)]+[(b,a_2)]-[(b,a_1+a_2)]$, hence the subgroup $R_{A\times B}$ generated by the first set onto the subgroup $R_{B\times A}$ generated by the second group.
Therefore we have a commutative diagram of abelian groups \begin{alignat}{5} 0\longrightarrow &R_{A\times B}\hookrightarrow&&\mathbf Z^{A\times B}\longrightarrow A\otimes B\longrightarrow 0 \\ &\quad\downarrow&&\enspace\downarrow\\ 0\longrightarrow &R_{B\times A}\hookrightarrow&&\mathbf Z^{A\times B}\longrightarrow B\otimes A\longrightarrow 0 \end{alignat} which induces a morphism from $A\otimes B$ to $B\otimes A$ by the universal property of kernels. As the the two vertical maps are group isomorphisms, the induced morphism is an isomorphism too.