This right description of multiparticle states via tensor product spaces may have been surprising for folks like Schrödinger and from the viewpoint of "wave mechanics", but it has been incorporated from the very beginning in "matrix mechanics", Heisenberg's and pals' approach to quantum mechanics.

After all, the wave functions for a single particle in 3 dimensions
$$\psi(x,y,z)$$
are already elements of the tensor product of 3 copies of spaces of wave functions for a particle in 1 dimension
$$\psi(x).$$
Now, if the number of coordinates in the configuration space is enhanced from 1 ($x$) not to 3 ($x,y,z$) but to $3N$, we clearly need a wave function that depends on all the variables, e.g.
$$\psi(x_1,y_1,z_1,x_2,y_2,z_2)$$
because all these coordinates are equally good coordinates on the configuration space and we already know that the wave function should be a function defined on the whole configuration space. It would be inconsistent to treat $3N$ particles differently. Abstract mechanics – even abstract classical mechanics – doesn't care whether the six coordinates belong to one or two particles, it's just our way to think about these degrees of freedom, not an essential qualitative property of the theory.

There isn't any intuition or experiment needed here. What you need to understand is that observables become operators and the commutators of the observables etc. were determined by Heisenberg from the very beginning and they immediately imply the tensor structure. If all the coordinates in $\vec r_1$ and $\vec r_2$ commute with each other, they must act on independent "directions" of a space where the wave function is defined, so the whole space must be 6-dimensional. There's no guess involved.

Again, if one tries to think in Schrödinger's picture and give various wrong materialist interpretations to the wave function, he could end up with different guesses – such as $N$ independent waves in 3 dimensions – but if one actually does the "quantization" of the previously classical system systematically, according to the universal rules, and demands that the observables become linear operators whose commutators are the right ones, the whole theory is completely determined.

There is nothing to be adjusted about the quantum description of a system of particles that previously exist in classical mechanics.

Your single wave function in 3 dimensions normalized so that its (squared) norm is 2 instead of 1 is completely equivalent to the wave function for a single particle. Just take a $\psi$ normalized to unity; $\sqrt{2}\psi$ is then normalized in your way. The theory is clearly equivalent and it describes 1 particle, not 2 particles. A more "promising" attempt would be to consider 2 wave functions of 3 variables for 2 particles. However, observables such as $\vec r_1$ and $\vec r_2$ are operators and they must be operators acting on a vector space associated with a single theory. It would make no sense to consider operators acting on different wave functions – that would be like adding apples and oranges and one could define things such as products (compositions) of these operators.

## Best Answer

## Tensor product

Writing the Hilbert space as a tensor product $$ \newcommand{\cH}{\mathcal{H}} \cH=\cH_A\otimes \cH_B $$ can be useful when we want to think of $\cH_A$ and $\cH_B$ as two complementary subsystems of the full system. Observables associated with subsystem $A$ act like the identity on the other factor $\cH_B$, and conversely. For example, an observable associated with subsystem $A$ has the form $O_A\otimes 1$. A general observable affects both $\cH_A$ and $\cH_B$. That is, a general observable is a sum of terms of the form $O_A\otimes O_B$ where the operators $O_{A/B}$ act only on $A/B$ respectively.

In particular, if $\cH=\cH_A\otimes \cH_B$, the Hamiltonian in the Schrödinger equation is a sum of terms of the form $O_A\otimes O_B$. In particular, terms of the form $O_A\otimes 1$ and $1\otimes O_B$ describe the dynamics of the $A$ and $B$ subsystems by themselves, and all other terms describe interactions between these subsystems.

Another example is a non-relativistic particle with spin: we can express the Hilbert space as a tensor product $\cH_X\otimes \cH_S$, where observables associated with the particle's location have the form $O_X\otimes 1$ and observables associated with its spin have the form $1\otimes O_S$. In this case, the different parts are usually called different "degrees of freedom" instead of different "subsystems." In the case of a non-relativistic particle, we can further factorize $\cH_X$ into three factors associated with the three dimensions of space. Again, we would normally call these "degrees of freedom" instead of "subsystems."

Very generally, we can define a "subsystem" or "degree of freedom" as a special collection of observables. The tensor product construction isn't needed for this, but it's often useful. If observables associated with different subsystems (or degrees of freedom) commute with each other, then the tensor product is often useful as a systematic way of mathematically delineating the different sets of observables: each set acts nontrivially on only one of the tensor factors.

The "subsystems" and "degrees of freedom" concepts are just vaguely-delineated special cases of a much more general idea: mutually commuting subsets of the set of observables. The same Hilbert space admits many different tensor-product factorizations. Which one is most useful (if any) depends on which operators we want to represent which physical observables -- the decisions we make when we're defining a model. A similar comment applies to the most common definitions/measures of "entanglement," because they refer to a given tensor product factorization.

Learning about the "split property" in quantum field theory reveals some

limitationsof the tensor product formulation. The split property is mentiond in this related post:Should it be obvious that independent quantum states are composed by taking the tensor product?

The tensor product also has other uses. For example, for a single particle at rest in 3-d space, we can systematically express the spin-$j$ representation for any $j$ by taking the tensor product of $2j$ copies of the spin-1/2 representation and symmetrizing. We can think of this as a special application of the subsystem idea, because a symmetrized collection of $2j$ spin-1/2 particles has total spin $j$.

Axioms I-IV listed in the OP are the same whether or not $\cH$ is written as a tensor product, because those axioms are independent of what representation we use for the Hilbert space.

## Direct sum

Writing the Hilbert space as a direct sum $$ \cH=\cH_1\oplus\cH_2 $$ is useful when we want to focus on a particular subspace of states. Using a non-relativistic single-particle model as an example, $\cH_1$ could consist of states in which a particle's wavefunction has support only within a given region $R$, and $\cH_2$ could consist of states with support in the complement of $R$.

More generally, given any discrete observable (such as the observable that asks "is the particle located in the region $R$ or not?"), we can write $\cH$ as the direct sum of that observable's eigenspaces. A direct-sum decomposition of the Hilbert space corresponds to a block-matrix representation of operators on the Hilbert space.

More esoterically, the direct sum is also useful for representing

mixedstates asvectorstates: every state, whether pure or mixed, can be expressed as a vector state in a sufficiently large Hilbert space, with the understanding the all observables have a block-diagonal form that doesn't mix the different direct-summands with each other. This fact is sometimes useful for proving theorems, and this fact can in turn be proven using the GNS Construction.Again, axioms I-IV listed in the OP are the same whether or not $\cH$ is written as a direct sum, because those axioms are independent of what representation we use for the Hilbert space.