Remembering back to my undergrad days, and courtesy of a quick Google: if you define the acoustic potential $\phi$ by:
$$u = - \nabla \phi$$
then the pressure is:
$$p = \rho \frac{\partial\phi}{\partial t}$$
For a plane wave in one dimension the wave equation is:
$$\frac{\partial^2\phi}{\partial x^2} = \frac{1}{c^2} \frac{\partial^2\phi}{\partial t^2}$$
and the solution is:
$$\phi = f(ct \pm x)$$
for some function $f$. So $u$ and $p$ are in phase. However for a spherical wave the wave equation looks rather different:
$$\frac{\partial^2 (r\phi)}{\partial x^2} = \frac{1}{c^2} \frac{\partial^2 (r\phi)}{\partial t^2}$$
and you get:
$$\phi = \frac {f(ct \mp x)}{r}$$
where if I recall correctly the minus sign means a wave travelling outwards and the plus means a wave travelling inwards. So:
$$u = \frac {f(ct \mp x)}{r^2} \pm \frac {f'(ct \mp x)}{r}$$
$$p = \rho c\frac {f'(ct \mp x)}{r}$$
It's because $u$ and $p$ have a different $r$ dependance that you get the change in phase for distances below a wavelength or two.
Phew! Now you're going to ask me to put this into simple physical terms, but you'll have to let me go away and think about that. I guess it's basically because with a spherical wave the acoustic energy falls with distance in a sort of 1/$r^2$ way, but for a plane wave the acoustic energy is constant with distance.
Based off of that video, the differences you're pointing out are the nice wavefronts from the speaker at 2:04 and then the clap shown at the beginning and the end.
It's true that the wave fronts from the speaker (and even the book) give nice "crests" and "troughs" whereas the clap kinda just... is this blob-y thing. There are several potential reasons why these appear differently.
- Harmonics: Speakers (and books hitting tables) produce very "pure" sounds. They hit the air very strongly, deforming very little. This produces a nice, single wavefront (per movement of the item), which you see as a dark line in the Schlieren videos. Hands, on the other hand, are floppy things and hit more "softly," jiggling as they come together. This produces a less "pure" sound, so this translates into a very amorphous wave front. (You can also say that hand-claps have harmonics, whereas the speaker and book have little to no harmonics.)
- Shape: the book and the speaker have a nice square or round shape to them. This produces much "nicer" and clearer wavefronts than our oddly-shaped hands. Flat or round objects make for waves we're generally used to seeing while studying physics.
- Perspective: the book and the speaker both had really ideal ways of setting them up to see the obvious waves. Hands, however, present a challenge. How can you show a non-symmetric 3-d wave on a 2-d screen? At best, you could see a wavefront, but you'll likely just see a blob, especially if the shape of the 3-d wave isn't really spherical. The symmetry of the objects allowed for good perspectives. Clapping hands can lack that kind of symmetry, preventing good shots.
- Power: that speaker or that book hitting the ground may have been more forceful than the man's clap. I suspect a more wave produced by more force (with "higher amplitude") appears darker. So, if that speaker produces a louder sound than hand clapping, you'll see darker lines there.
- Camera Tricks: Finally, we should address the fact that, although these images all use the Schlieren technique, they likely have different settings for different shots. If we put the speaker next to that clapping man and filmed them both, maybe the speaker's wavefronts would appear as weak little things, just like the man's clap. This is just a possibility, though, so this reason is much weaker than the others.
The shape of these wavefronts all have to deal with the thing that made them; as far as propagation goes, they all move under the same laws. It's their initial shape, the force with which they were made, and the thing that made them which determines the differences in the waves.
Best Answer
Your book is (a little bit) wrong. Sound is a pressure wave - and a pressure wave exists when molecules move a little bit away from and towards the source (they are longitudinal waves). However, there is no NET movement - in other words, they don't travel with the sound. In that sense, they don't leave their approximate position - but if they didn't move at all, there would be no wave and no sound.
When the entire medium (the volume of air transmitting sound) moves, then the sound wave propagates from point A to point B with the combined velocity of sound in stationary air, plus the velocity of the wind. You can think of an experiment in a train. Measure the speed of sound from one end of the rail car to the other and back. You won't be able to tell whether the train is moving because the air appears to be still in your frame of reference. However, if the train is moving, somebody from outside the train who observes your experiment will see the sound moving "faster" when it is traveling in the direction of the train (it will cross the length of the rail car in the same amount of time that you observed, but it will appear to have covered a greater distance). Similarly they will see it moving "slower" when the sound goes from the front of the train to the back.
When there is no train but the wind is moving the air, the exact same thing is still valid.