The beats are audible at lower frequencies because your ears do in fact pick up phase information, but only at these lower frequencies.
When a sound enters our ear, we magnify it via mechanical oscillations of bones and hydraulic effects, ultimately causing vibration in a thin film in our inner ear called the basilar membrane. Different sections of the basilar membrane will vibrate in response to different tones. The basilar membrane is connected to thousands of small hairs, themselves connected to mechanically-sensitive ion gates. Oscillations of these hair then trigger the ion gates. The ion gates send electrical impulses down neurons to our brains.
Empirically, it is observed that these nerve impulses almost always begin at the peak amplitude of a vibration of the basilar membrane. Thus, if our two ears receive sound with different phase, they will fire nerve impulses at different times, and our brains will have access to phase information.
An interesting demonstration of this was given by Lord Raleigh in 1907. He theorized that phase difference detection between the ears was a key component to our ability to localize sound. When Raleigh played two tuning forks that were slightly out of tune, so that the phase oscillated, his found that human perception of the location of the sound oscillated from the left to the right of the listener's head.
At high frequencies, we lose phase information. This is because of uncertainties in the exact time of arrival of a nerve impulse. A typical nerve impulse lasts several milliseconds, so above 1000 Hz the uncertainty in arrival time becomes comparable to the frequency itself, meaning we lose phase information. It turns out that we mostly lose the ability to localize sound in the range 1000 - 3000 Hz. Above 3000 Hz, different physiological mechanisms related to the "shadow" of your head allow us to localize sound again.
Reference:
http://en.wikipedia.org/wiki/Action_potential
The information about Rayleigh's experiment and firing at the peak of oscillations is from chapter 5 of "The Science of Sound" by Rossing, Wheeler, and Moore.
The previous answer is qualitatively too generous. The maximum frequency is the mean free path divided by the speed of sound, and it is a gradual thing, defined by greater and greater attenuation as you approach the limit rather than a sharp cutoff, as there is for phonons in a solid.
The mean free path in air is 68nm, and the mean inter-atomic spacing is some tens of nms about 30, while the speed of sound in air is 300 m/s, so that the absolute maximum frequency is about 5 Ghz. This is much less than 100 Ghz.
Ultrasound in fluids and solids is not similarly limited, because fluids are much denser, and solids have periodicity. In fluids, you should be able to go to 1nm wavelength sound waves, perhaps a little shorter, with a speed of sound in the range of 1500 m/s. This gives 1500 Ghz as the cutoff, much, much higher than in air.
In a solid, the phonon frequency is periodic, since phonons are defined by lattice displacements. In this case, the maximum frequency is estimated by twice the inter-atomic distance over the speed of sound. this gives 20,000 Ghz as the limiting phonon frequency, again higher, because the speed of sound in solids can be 2-3 times higher, and (twice) the interatomic spacing is five times smaller than a liquid. So it is safe to put the upper limit of ultrasound in metals at 100,000 Ghz, and then only for small-atom metals. If you look at optical phonon bands, you can get frequencies like this over a wide range of modes.
Attenuation buildup
The speed of sound is determined by the relation of pressure to density. From this alone, Newton derived the propagation of sound in any medium.
To understand what happens as you get to higher frequencies, one must understand the statistical nature of sound in a medium like air or water. Where a sound-wave has high pressure, there is just a tendency for a few more air molecules or water molecules to be present per cubic nm. This tendency is statistical, so it has shot-noise, due to the discrete nature of atoms. When the number of extra air or water molecules in one wavelength gets to be order 1, the shot noise dominates the pressure relationship, and the linear relation between pressure and response is wrecked.
This means that such short-wavelength pressure variations are washed out by thermal fluctuations in a short time or just in any instantiation of a statistical state, and they cannot coherently propagate the pressure long distances, unlike long-wavelength pressure waves. You can see this by attenuation lengths for the ultra-sound-waves. As you go to shorter wavelengths, the attenuation length decreases.
The attenuation coefficient for ultrasound describes how the waves die down over distance. By comparing the attenuation length to the wavelength, one gets an estimate for the maximum frequency at which ultrasound can coherently propagate in a statistical fluid. is less than a wavelength for air-sound-waves of order 70nm, so that these waves die out too quickly to propagate as sound. This is how nature enforces the cutoff for statistical fluids.
Impossibility of Wikipedia attenuation model at high frequencies
The attenuation model on Wikipedia states that the attenuation length (length to e-folding) shrinks inversely as the first power of the frequency. This leads to an attenuation over one wavelength which is independent of the wavelength, and equal to approximately $10^{-4}$ in water.
This model is clearly wrong, since the attenuation must be comparable to the wavelength at the point where the pressure variations have atomic scale shot-noise, meaning at the nm scale and below.
Unfortunately, I was not able to find either free data or a more accurate attenuation model in a quick search. So I leave the answer as is. The coefficient of quadratic dependence of the attenuation on the frequency should lead the attenuation over one wavelength to be order 1 when the wavelength is order 1nm.
Best Answer
The gory details of this are found in the answer at https://physics.stackexchange.com/a/266046/59023.
Yes, the thing you are looking for is called acoustic impedance, which decreases with decreasing ambient pressure. You may think that a decreasing impedance would allow a sound wave to propagate further, but the reference sound intensity, $I_{o}$, depends upon the characteristic acoustic impedance, $z_{o}$, as: $$ I_{o} = P_{o}^{2}/z_{o} \tag{0} $$ where $P_{o}$ is a constant reference pressure here associated with the hearing threshold, i.e., ~20 $\mu$Pa at 1000 Hz (it's not flat across frequency, but adding the frequency dependence is not necessary to illustrate the main point). The characteristic acoustic impedance is defined as: $$ z_{o} = \rho \ C_{s} \tag{1} $$ where $\rho$ is the mass density and $C_{s}$ is the speed of sound.
The point where such a sound wave would experience strong damping is where the collisional mean free path becomes too large to support the oscillations, i.e., this would occur when the average time between collisions becomes comparable to the wave frequency. Thus, the oscillations would have no restoring force and would damp out.
In weakly damped systems, the intensity of sound decreases as $I\left( r \right) \propto r^{-2}$ while sound pressure decreases as $P\left( r \right) \propto r^{-1}$. If we look at a rough estimate of the atmospheric pressure as a function of altitude in Earth's atmosphere, we reach ~600 Pa by ~43 km.
Using the table in the answer at https://physics.stackexchange.com/a/266046/59023 for 40 km altitude, the magnitudes of $I_{o}$ and $z_{o}$ are ~1.155 x 10-10 W m-2 and ~3.462 Pa s m-1, respectively (at sea level and STP, these satisfy $z_{o}$ ~ 428 Pa s m-1 and $I_{o}$ ~ 9.346 x 10-13 W m-2).
Suppose we start with a sound intensity level of $L_{o}$ = 100 dB and we know that the intensity of the source is given as: $$ I_{src}\left( h \right) = I_{o}\left( h \right) 10^{L_{o}/10} \tag{2} $$ where $h$ is the altitude. So a 100 dB source at sea level would start with $I_{src}\left( 0 \ km \right)$ ~ 9.346 x 10-3 W m-2. To maintain the same intensity at ~40 km, the source intensity would have to increase to $I_{src}\left( 40 \ km \right)$ ~ 1.155 x 10+0 W m-2, i.e., increase by a factor of ~124.
The sound level intensity at a distance $r$ from the source is given by: $$ L_{r}\left( h, r \right) = L_{i,src}\left( h \right) + 20 \ \log_{10} \left( \frac{ 1 }{ r } \right) \tag{3} $$ where a 1 m normalizing distance is used and the source sound level intensity relative to sea level is defined as: $$ L_{i,src}\left( h \right) = 10 \ \log_{10} \left( \frac{ I_{src}\left( 0 \ km \right) }{ I_{o}\left( h \right) } \right) \tag{4} $$ You can see that $L_{i,src}\left( 0 \ km \right)$ = 100 dB, as we defined and so $L_{i,src}\left( 40 \ km \right)$ = 79.1 dB.
Note that sound pressure is related to sound level intensity through: $$ L_{p}\left( r \right) = 20 \ \log_{10} \left( \frac{ P\left( r \right) }{ P_{o} } \right) \tag{5} $$ so $L_{p}\left( r \right)$ = 100 dB corresponds to $P\left( r \right)$ = 2 Pa for $P_{o}$ ~ 20 $\mu$Pa at 1000 Hz and $L_{p}\left( r \right)$ = 79.1 dB corresponds to $P\left( r \right)$ ~ 0.18 Pa. Equation 5 shows that the sound level intensity is defined to be zero at the threshold of hearing, but if we approximate $L_{p}\left( r \right)$ ~ 0.001 dB to be the boundary of hearing then we can estimate how far away from a 100 dB one would need to be to reach this level for an atmospheric pressure of 600 Pa.
For the ~40 km altitude we used before, $L_{r}\left( 40 \ km, r \right)$ goes to ~0.001 dB at $r$ ~ 9 km (~5.6 miles). Note that 100 dB is really loud. A typical subway train arriving at a platform only generates ~90 dB of intensity. A chainsaw at ~1 m is about ~110 dB. Typical breathing is about ~10 dB. So to reduce a 100 dB sound to the equivalent of breathing in an atmospheric pressure equivalent to 40 km altitude above Earth, one would need to be ~2.8 km (~1.8 miles) away.
The ambient pressure reaches ~5 Pa in Earth's atmosphere at an altitude of ~84 km. So if you follow the above steps and use the values in the table for 80 km found at https://physics.stackexchange.com/a/266046/59023, you should find your answer or at least a good enough approximation to figure out what can and cannot be occurring.