(Disclaimer: I am only a highschool student and have learned the following mostly on my own. If there are any mistakes, please feel free to correct me!)
An atomic orbital represents the probability distribution* of the location of an electron around the nucleus and is mathematically described by a wave function.
Now what does this mean? Let's start with what an atomic orbital isn't:
- An orbital is not a fixed spatial region or a "container" in which an electron can move around - In Quantum mechanics, an electron does not have a specific location.
So what is an atomic orbital?
As mentioned before, the electrons don't have a fixed position (and momentum, but this seems less relevant to me at this point), so we cannot determine its position to a single point - this only happens when we measure the position.
When we measure the position, we find it to be more likely to be present at some points than at other points. This is what is meant by the probability distribution - it simply describes the probability of "finding" an electron when measuring its position for every point in space. So theoretically, there is a probability that at any point in time, some electron is 100km away from the atom it belongs to, but this probability is extremely small. (see What is the probability for an electron of an atom on Earth to lie outside the galaxy?)
Now assume that we measure the position of the electrons for 1000 times and plot the measured positions to some 3-dimensional model of our atom. We will find that in 90% of the cases the electron is in a certain area of space and this is usually depicted by the familiar atomic orbital shapes:
![enter image description here](https://i.stack.imgur.com/Ci2x3.png)
(Source)
So the shapes of the orbitals as they are most often depicted is usually chosen in such a way that the probability of finding the electron inside this shape (when measuring its position) is at least 90%. However, note that the electron is not constrained to this shape and there is a probability that it is measured outside.
There are some other things to mention about orbitals apart from their "shape". One of these is that every orbital has a certain energy level associated with it. This means that when an electron is in an orbital $A$ it has the exact energy associated with $A$.
If there is another orbital $B$ with higher energy level than $A$, the electron in $A$ can "jump" to $B$ if it absorbs the exact amount energy which is the difference between the energy levels of $A$ and $B$. The most common example is an electron absorbing a photon which has the wavelength that corresponds to the energy differents of the orbitals. Likewise, electrons can jump to an orbital with lower energy by emitting a photon with the wavelength corresponding to the difference in energy between the orbitals.
Here is a graph showing the relative energy levels of some atomic orbitals:
![enter image description here](https://i.stack.imgur.com/5vMSE.png)
(Source)
I hope this somewhat clears up the confusion.
*As mentioned in the comments, the wavefunction $\psi$ describing an atomic orbital does not directly give the probability density, but the probability amplitude. The probability density can be obtained by $|\psi |^2$ for complex orbitals or $\psi ^2$ for real orbitals.
Best Answer
If you have an operator $D$ with $$D(\Psi+\Phi)=D(\Psi)+D(\Phi),$$ then if $D(\Psi)=0$ and $D(\Phi)=0$, you can also conclude that $D(\Psi+\Phi)=0$. This is the case for the Schrödinger equation, as it reads
$$D(\Psi):=(i\hbar\tfrac{\partial}{\partial t}-H)\Psi=0,$$
where $H$ is linar. For example you certainly have linearity for the derivatives: $$(f(x)+g(x))'=f'(x)+g'(x)$$ and even more so for multiplicative operators: $$V(x)\cdot (f(x)+g(x))=V(x)\cdot f(x)+V(x)\cdot g(x).$$
The books point out that the superposition is possible like that to emphasise that the probability waves don't affect each other and so this enables you to find solutions of the equation.
If, in contrast, the Schrödinger equation would read
$$D(\Psi):=(i\hbar\tfrac{\partial}{\partial t}-H)\Psi^2=0,$$
which is non-linear because of the $\Psi^2=0$, then you'd have
$$D(\Psi+\Phi)=D(\Psi)+D(\Phi)+D(\sqrt{2\cdot\Psi\cdot\Phi}),$$
and from $\Phi$ and $\Psi$ being a solution ($D(\Psi)=0$ and $D(\Phi)=0$) it would not follow that $\Psi+\Phi$ is a solution too (you only get $D(\Psi+\Phi)=0+0+D(\sqrt{2\cdot\Psi\cdot\Phi})\ne0$).
What do you mean by "the physics between them"?
Anyway, as an illustration, if you have a function like $\Psi(x)=A\text{e}^{-(x-3)^2}$, which is a bump located around the point $x=3$, and you add it with a function $\Phi(x)=B\text{e}^{-(x-7)^2}$, which is a bump located around the point $x=7$, then you get a function $$\chi(x):=\Psi(x)+\Phi(x)=A\text{e}^{-(x-3)^2}+B\text{e}^{-(x-3)^2},$$ which has two bumps.
http://www.wolframalpha.com/input/?i=Plot[Exp[-%28x-3%29^2]%2C{x%2C-1%2C11}]
http://www.wolframalpha.com/input/?i=Plot[Exp[-%28x-3%29^2]%2BExp[-%28x-7%29^2]%2C{x%2C-1%2C11}]
The wave function relate to propability densities, and if you have high probailities at the points $x=3$ for $\Phi$ and at $x=7$ for $\Phi$, then $\Psi+\Phi$ will tend to describe a situation, which has relatively high probabilities on both of these points.