It is essentially impossible to answer the general question of "how does multilinearity come up naturally in physics?" because of the myriad of possible examples that make up the total answer. Instead, let me describe a situation that very loudly cries out for the use of tensor products of two vectors.
Consider the problem of conservation of momentum for a continuous distribution of electric charge and current, which interacts with an electromagnetic field, under the action of no other external force. I will describe it more or less along the lines of Jackson (Classical Electrodynamics, 3rd edition, §6.7) but depart from it towards the end. This will get very electromagneticky for a while, so if you want to skip to the tensors, you can go straight to equation (1).
The rate of change of the total mechanical momentum of the system is the total Lorentz force, given by
$$
\frac{ d\mathbf{P}_\rm{mech}}{dt}
=\int_V(\rho\mathbf{E}+\mathbf{J}\times \mathbf{B})d\mathbf{x}.
$$
To simplify this, one can take $\rho$ and $\mathbf{J}$ from Maxwell's equations:
$$
\rho=\epsilon_0\nabla\cdot\mathbf{E}
\ \ \ \text{ and }\ \ \
\mathbf{J}=\frac1{\mu_0}\nabla\times \mathbf{B}-\epsilon_0\frac{\partial \mathbf{E}}{\partial t}.$$
(In particular, this means that what follows is only valid "on shell": momentum is only conserved if the equations of motion are obeyed. Of course!)
One can then put these expressions back, to a nice vector calculus work-out, and come up with the following relation:
$$
\begin{align}{}
\frac{ d\mathbf{P}_\rm{mech}}{dt}
+&\frac{d}{dt}\int_V\epsilon_0\mathbf{E}\times \mathbf{B}d\mathbf{x}
\\ &=
\epsilon_0\int_V \left[
\mathbf{E}(\nabla\cdot \mathbf{E})-\mathbf{E} \times(\nabla \times \mathbf{E}) + c^2 \mathbf{B} (\nabla \cdot \mathbf{B})- c^2 \mathbf{B} \times (\nabla \times \mathbf{B})
\right]d\mathbf{x}.
\end{align}
$$
The integral on the left-hand side can be identified as the total electromagnetic momentum, and differs from the integral of the Poynting vector by a factor of $1/c^2$. To get this in the proper form for a conservation law, though, such as the one for energy in this setting,
$$
\frac{dE_\rm{mech}}{dt}
+\frac{d}{dt}\frac{\epsilon_0}{2}\int_V(\mathbf{E}^2
+c^2\mathbf{B}^2)d\mathbf{x}
=
-\oint_S \mathbf{S}\cdot d\mathbf{a},
$$
we need to reduce the huge, ugly volume integral into a surface integral.
The way to do this, is, of course, the divergence theorem. However, that theorem is for scalars, and what we have so far is a vector equation. To work further then, we need to (at least temporarily) work in some specific basis $\{\mathbf{e}_1,\mathbf{e}_2,\mathbf{e}_3\}$, and write $\mathbf{E}=\sum_i E_i \mathbf{e}_i$. Let's work with the electric field term first; after that the results also apply to the magnetic term. Thus, to start with,
$$
\begin{align}{}
\int_V \left[
\mathbf{E}(\nabla\cdot \mathbf{E})-\mathbf{E} \times(\nabla \times \mathbf{E})
\right]d\mathbf{x}
=
\sum_i \mathbf{e}_i
\int_V \left[
E_i(\nabla\cdot \mathbf{E})-\mathbf{e}_i\cdot\left(\mathbf{E} \times(\nabla \times \mathbf{E})\right)
\right]d\mathbf{x}.
\end{align}
$$
These terms should be simplified using the vector calculus identities
$$
E_i(\nabla\cdot \mathbf{E})
=
\nabla\cdot\left(E_i \mathbf{E}\right) - \mathbf{E}\cdot \nabla E_1
$$
and
$$
\mathbf{E} \times(\nabla \times \mathbf{E})
=
\frac12\nabla(\mathbf{E}\cdot\mathbf{E})-(\mathbf{E}\cdot\nabla)\mathbf{E},
$$
which mean that the whole combination can be simplified as
$$
\begin{align}{}
\int_V \left[
\mathbf{E}(\nabla\cdot \mathbf{E})-\mathbf{E} \times(\nabla \times \mathbf{E})
\right]d\mathbf{x}
=
\sum_i \mathbf{e}_i
\int_V \left[
\nabla\cdot\left(E_i \mathbf{E}\right)
-
\mathbf{e}_i\cdot\left(
\frac12\nabla(\mathbf{E}\cdot\mathbf{E})
\right)
\right]d\mathbf{x},
\end{align}
$$
since the terms in $\mathbf{E}\cdot \nabla E_i$ and $\mathbf{e}_i\cdot\left( (\mathbf{E}\cdot\nabla)\mathbf{E}\right)$ cancel. This means we can write the whole integrand as the divergence of some vector field, and use the divergence theorem:
$$
\begin{align}{}
\int_V \left[
\mathbf{E}(\nabla\cdot \mathbf{E})-\mathbf{E} \times(\nabla \times \mathbf{E})
\right]d\mathbf{x}
&=
\sum_i \mathbf{e}_i
\int_V \nabla\cdot\left[
E_i \mathbf{E}
-
\frac12 \mathbf{e}_i E^2
\right]d\mathbf{x}
\\ & =
\sum_i \mathbf{e}_i
\oint_S\left[
E_i \mathbf{E}
-
\frac12 \mathbf{e}_i E^2
\right]\cdot d\mathbf{a}. \tag 1
\end{align}
$$
In terms of conservation law structure, we're essentially done, as we've reduced the rate of change of momentum to a surface term. However, it is crying out for some simplification. In particular, this expression is basis-dependent, but it is so close to being basis independent that it's worth a closer look.
The first term, for instance, is simply crying out for a simplification that would look something like
$$
\sum_i \mathbf{e}_i
\oint_S
E_i \mathbf{E}\cdot d\mathbf{a}
=
\oint_S
\mathbf{E}\, \mathbf{E}\cdot d\mathbf{a}
$$
if we could only make sense of an object like $\mathbf{E}\, \mathbf{E}$. Even better, if we could make sense of such a combination, then it turns out that the seemingly basis-dependent combination that would come up in the second term, $\sum_i \mathbf{e}_i\,\mathbf{e}_i$, turns out to be basis independent: one can prove that for any two orthonormal bases $\{\mathbf{e}_1, \mathbf{e}_2, \mathbf{e}_3\}$ and $\{\mathbf{e}_1', \mathbf{e}_2', \mathbf{e}_3'\}$, those combinations are the same:
$$
\sum_i \mathbf{e}_i\,\mathbf{e}_i = \sum_i \mathbf{e}_i'\,\mathbf{e}_i'
$$
as long as the product $\mathbf{u}\,\mathbf{v}$ of two vectors, whatever it ends up being, is linear on each component, which is definitely a reasonable assumption.
So what, then, should this new vector multiplication be? One key to realizing what we really need is noticing the fact that we haven't yet assigned any real physical meaning to the combination $\mathbf{E}\,\mathbf{E}$; instead, we're only ever interacting with it by dotting "one of the vectors of the product" with the surface area element $d\mathbf{a}$, and that leaves a vector $\mathbf{E}\,\mathbf{E}\cdot d\mathbf{a}$ which we can integrate to get a vector, and that requires no new structure.
Let's then write a list of how we want this new product to behave. To keep things clear, let's give it some fancy new symbol like $\otimes$, mostly to avoid unseemly combinations like $\mathbf{u}\,\mathbf{v}$. We want then,
- a function $\otimes:V\times V\to W$, which takes euclidean vectors in $V=\mathbb R^3$ into some vector space $W$ in which we'll keep our fancy new objects.
- Combinations of the form $\mathbf{u}\otimes \mathbf{v}$ should be linear in both $\mathbf{u}$ and $\mathbf{v}$.
- For all vectors $w$ in $V$, and all combinations $(\mathbf{u},\mathbf{v})\in V\times V$, we want the combination $(\mathbf{u}\otimes \mathbf{v})\cdot\mathbf{w}$ to be a vector in $V$. Even more, we want that to be the vector $(\mathbf{v}\cdot\mathbf{w})\mathbf{u}\in V$.
That last one looks actually pretty strong, but there's evidently room for improvement. For one, it depends on the euclidean structure, which is not actually necessary: we can make an equivalent statement that uses the vector space's dual.
- For all $(\mathbf{u},\mathbf{v})\in V\times V$ and all $f\in V^\ast$, we want $f_\to(\mathbf{u}\otimes \mathbf{v})=f(\mathbf{v})\mathbf{u}\in V$ to hold, where $f_\to$ simply means that $f$ acts on the factor on the right.
Finally, if we're doing stuff with the dual, we can reformulate that in a slightly prettier way. Since two vectors $\mathbf{u},\mathbf{v}\in V$ are equal if and only if $f(\mathbf{u})=f(\mathbf{v})$ for all $f\in V^\ast$, we can give another equivalent statement of the same statement:
- For all $(\mathbf{u},\mathbf{v})\in V\times V$ and all $f,g\in V^\ast$, we want $g_\leftarrow f_\to(\mathbf{u}\otimes \mathbf{v})=g(\mathbf{u})f(\mathbf{v})\in V$.
[Note, here, that this last rephrasing isn't really that fancy. Essentially, it is saying that the vector equation (1) is really to be interpreted as a component-by-component equality, and that's not really off the mark of how we actually do things.]
I could keep going, but it's clear that this requirement can be rephrased into the universal property of the tensor product, and that rephrasing is a job for the mathematicians. Thus, you can see the story like this: Upon hitting equation (1), we give to the mathematicians this list of requirements. They go off, think for a bit, and come back telling us that such a structure does exist (i.e. there exist rigorous constructions that obey those requirements) and that it is essentially unique, in the sense that multiple such constructions are possible, but they are canonically isomorphic. For a physicist, what that means is that it's OK to write down objects like $\mathbf{u}\otimes \mathbf{v}$ as long as one does keep within the rules of the game.
As far as electromagnetism goes, this means that we can write our conservation law in the form
$$
\frac{ d\mathbf{P}_\rm{mech}}{dt}
+\frac{d}{dt}\int_V\epsilon_0\mathbf{E}\times \mathbf{B}d\mathbf{x}
=
\oint_A \mathcal T\cdot d\mathbf{a}
$$
where
$$
\mathcal T
=
\epsilon_0\left[
\mathbf{E}\otimes\mathbf{E}+c^2\mathbf{B}\otimes\mathbf{B}
-\frac12\sum_i\mathbf{e}_i\otimes\mathbf{e}_i\left(E^2+c^2 B^2\right)
\right]
$$
is, of course, the Maxwell stress tensor.
I could go on and on about this, but I think this really captures the essence of how and where it happens in physics that a situation is really begging the use of a tensor product. There are other such situations, of course, but this is the clearest one I know.
The first of these is easy:
How are tensors from QM and tensors from linear algebra widely used in geometry related?
They're the same thing ─ though sometimes QM will choose to look only at some specific subset of tensors, e.g. sets which carry group-theoretic representations of the rotation group.
The apparent conflict arises because there's a wide spectrum of ways to talk about tensors, and you're pulling examples from two complete extremes of that spectrum.
Let's bridge the gap by taking the universal-property tensor product, as in the first understanding,
- Theorem: Given two vector spaces $U$ and $V$ over $F$, there exists a vector space $U\otimes V$ and a bilinear product $\otimes:U\times V\to U\otimes V$ such that for every bilinear $f:U\times V\to F$ there exists a $g:U\otimes V$ such that $f(u,v) = g(u\otimes v)$. Moreover, this vector space and mapping are unique up to a canonical isomorphism.
and provide an instantiation of that abstract tensor-product state:
- Claim. Let $U,V$ be vectors spaces over $F$, and choose coordinate representations $i_U:U\to F^n$ and $i_V:V\to F^m$ for them. Then $F^{n\times m}$ and $\otimes: U\times V\to F^{n\times m}$, defined via
$$\otimes:(u,v)\mapsto \left(i_U(u)_ji_V(v)_k\right)_{j,k=1}^{n,m},$$
are an instantiation of the abstract tensor-product as defined above.
Put another way, this just says that you can do tensor products in a coordinate-wise way, in a manner that is reasonably straight-forward to work out.
Quantum mechanics uses tensors in the second sense, in that a $\boldsymbol U\bf\otimes \boldsymbol V$-tensor-valued operator is defined as an $(n\times m)$-tuple of operators $\hat w_{jk}:\mathcal H \to\mathcal H$, with the understanding that if we have $U$-vector and $V$-vector operator tuples $\hat u_j$ and $\hat v_k$ we can form their tensor product (where order now matters) as $\hat u_j \hat v_k$.
Of course, now that we've done this, we need to walk back some distance, since we're just trying to talk about $U$-vector and $U\otimes V$-tensor operators, and those spaces don't come equipped with canonical coordinate maps $i_U$ and so on. Thus, we also need to demand that if we change our choices of coordinate maps, then the operator tuples will change to the same linear combinations that they would if they were plain coordinates instead of operators. This is what the usual requirement that 'tensor operators need to transform as tensors' means.
(I should also mention that the traditional treatment is a bit more obtuse than it strictly needs to be. This great answer shows that you can define $U$-vector operators to be simply linear operators $\hat u:\mathcal H \to \mathcal H \otimes U$, and it is easy to extend that formalism to $U\otimes V$-tensor operators defined as linear operators $\hat w:\mathcal H \to \mathcal H \otimes U\otimes V$.)
As you can see, then, quantum mechanics is perfectly happy to talk about tensor operators occupying any abstract tensor product you wish to pull from classical mechanics. However, when one is actually out and about doing quantum mechanics, one usually doesn't care about arbitrary tensor products - we specifically care about tensor products of $\mathbb R^3$ with itself, and we care about how those tensor products interact with the additional structure carried by our vector spaces, including in particular its inner-product structure and with it the symmetry group of that structure, the rotation group.
This is where the representation theory comes in, and it does so in the most natural way. Let me phrase this in a general setting:
- Let $U$ and $V$ be vector spaces over $F$, $G$ be a group, and $R:G\to \mathrm{GL}(U)$ and $S:G\to \mathrm{GL}(V)$ be representations of $G$. There is then a natural representation $T:G\to \mathrm{GL}(U\otimes V)$ in the tensor product space, which is uniquely specified by its action on tensor-product vectors,
$$T(g)(u\otimes v) = R(g)(u)\otimes S(g)(v).$$
Normally, of course, we have $U=V=\mathbb R^3$ and $G=\mathrm{SO}(3)$. Typically, even if the single factor representations $R$ and $S$ are irreducible, the tensor-product representation will not be irreducible. When we speak of tensors being reducible or irreducible, we're talking about the word in the representation-theoretic way: a reducible tensor lives in a tensor-product space that carries a reducible representation of $\mathrm{SO}(3)$, while an irreducible tensor lives in a restricted subspace such which carries an irreducible representation.
This is also where spherical tensors come in: they are simply a convenient basis for the restricted subspaces that carry irreducible representations. When they are notated as $T_q^{(k)}$, it normally means that you have a tensor-valued operator (i.e. living in some bigger tensor product space, whose size and number of factors is not that relevant) that's been restricted to a subspace $\mathrm{span} \mathopen{}\left(\{T_{-k}^{(k)}, T_{-k+1}^{(k)},\ldots, T_{k-1}^{(k)}, T_{k}^{(k)}\}\right)\mathclose{}$ that carries the $k$ representation of $\mathrm{SO}(3)$.
In this sense, your final question (why are all tensors in QM spherical tensors?) can be rephrased as follows: why are all tensors in QM separated into irreducible representations of the rotation group? The answer, of course, is that they aren't, and there's nothing intrinsic about QM that requires tensors to live in irreducible representations ─ it's just that they're more useful so we use them more often.
Best Answer
OP's candidate definition is a direct transcription of the tensor operator notion used in physics (and e.g. in Sakurai section 3.10) into a manifestly coordinate-independent mathematical construction. Tensor operators are e.g. used in the Wigner-Eckart theorem.
In this answer we suggest the following slight generalization of OP's candidate definition. Let the following five items be given:
Let $G$ be a group.
Let $H$ be a complex Hilbert space.
Let $\rho: G \to GL(V,\mathbb{F})$ be a group representation.
Let $R:G \to B(H)$ be a group representation.
Let $T:V\to L(H;H)$ be a linear map.
OP's candidate definition may be viewed as a special case of definition (*). For instance, if $\rho_0: G \to GL(V_0,\mathbb{F})$ is a group representation, then one may let $\rho: G \to GL(V,\mathbb{F})$ in point 3 be the tensor product representation $\rho=\rho_0^{\otimes m}$ with vector space
$$V~=~V_0^{\otimes m}~=~\underbrace{V_0\otimes \ldots \otimes V_0}_{m \text{ factors}}.$$