Please note that the following is all conjectural. I only volunteer it due to the lack of other responses after numerous days, the coolness of the question, and the probably lack of people/references who are explicitly experienced with this specific topic.
Basic Picture
As a general relation, I'm sure one can correlate the sound-volume with the total energy being dissipated --- but the noise produced is going to be a (virtually) negligible fraction of that total energy (in general, sound caries very little energy1).
To zeroth order, I think it's safe to assume the waterfall produces white-noise, but obviously that needs to be modified to be more accurate (i.e. probably pink/brown to first order). Also, by considering the transition from a small/gradual slope, to an actual waterfall, I can convince myself that there is definitely dependence on the height of the fall in addition to the water-volume2.
How would height effect the spectrum?
Generally power-spectra exhibit high and low energy power-law (like) cutoffs, and I would expect the same thing in this case. In the low-frequency regime, if you start with a smooth flow before the waterfall, there isn't anything to source perturbations larger than the physical-size scale of the waterfall itself. So, I'd expect a low-energy cutoff at a wavelength comparable to the waterfall height. In other words, the taller the waterfall, the lower the rumble.
There also has to be a high energy cutoff, if for no other reason, to avoid an ultraviolet catastrophe/divergence. But physically, what would cause it? Presumably the smallest scale (highest frequency) perturbations come from flow turbulence3, and thus would be determined primarily by the viscosity and dissipation of the fluid4. Generally such a spectrum falls off like the wavenumber (frequency) to the -5/3 power. But note that this high-frequency cutoff wouldn't seem to change from waterfall to waterfall.
Overall, I'm suggesting (read: conjecturing) the following:
- Low-frequency exponential or power-law cutoff at wavelengths comparable to the height of the waterfall.
- High-frequency power-law cutoff from a kolmogorov turbulence spectrum, at a wavelength comparable to the viscous length-scale.
- These regimes would be connected by a pink/brown-noise power-law.
- The amplitude of the sound is directly proportional to some product of the flow-rate and waterfall height (I'd guess the former-term would dominate).
E.g.: The following power spectrum (power vs. frequency - both in arbitrary units).
The Answer
I'm sure information can be obtained from the sound. In particular, estimates of its height/size, flow-rate, and distance5. I'm also sure this would be quite difficult in practice and, for most purposes, just listening and guessing would probably be as accurate as any quantitative analysis ;)
Additional consideration?
I suppose its possible waterdrop(let)s could source additional sound at scales comparable to their own size. That would be pretty cool, but I have no idea how to estimate/guess if that's important or not. Probably they would only contribute to sound at wavelengths comparable to their size (and thus constrained by the max/min water-drop sizes6...).
Water, especially in a mist/spray, can be very effective at damping sound (which they used to use for the space-shuttle). I'd assume that this would have a significant effect on the resulting sound for heights/flow-volumes at which a mist/spray is produced.
The acoustic properties of the landscape might also be important, i.e. whether the landscape is open (with the waterfall drop-off being like a step-function) or closed (like the drop-off being at the end of a u-shaped valley, etc).
Finally, the additoinally surfaces involved might be important to consider: e.g. rocks, the surface of the waterfall drop-off, sand near the waterfall base, etc etc.
Endnotes
1: Consider how much sound a 60 Watt amp produces, and assume maybe a 10% efficiency (probably optimistic). That's loud, and carrying a small amount of power compared to what a comparable-loudness waterfall is carrying. The vast-majority of waterfall energy will end up as heat, turbulence, and bulk-motion.
2: I'd also guess that height/volume blend after some saturation point (i.e. 1000 m3/min at 20m height is about the same as 500 m3/min at 40m height)... but lets ignore that for now.
3: Turbulence tends to transfer energy from large-scales to small-scales.
See: http://en.wikipedia.org/wiki/Turbulence
4: Figuring out the actual relation for the smallest size-scale of turbulence is both over my head and, I think, outside the scale of this 'answer'. But it involves things like the Kolmogorov spectrum, and associated length scale.
5: Distance could be estimates based on a combination of the spectrum and volume level - to disentangle the degeneracy between sound-volume and distance.
6: Perhaps the minimum droplet size is determined by it behaving ballistically (instead of forming a mist)?
It isn't possible to create an audio source in mid-air using the method you've described. This is because the two ultrasonic waves would create an audible source if the listener were standing at that spot, but those waves would continue to propagate in the same direction afterwards. You would need, as I point out below, some sort of medium which scattered the waves in all directions to make it seem as if the sound were coming from the point at which you interfered the two waves.
It is possible, however, to make the user percieve the sound as coming from a specific location, but it isn't as easy as the author makes it seem. I can think of two different ways. First of all, as described by @reirab, you can get audio frequencies by interfering two sound waves of high frequency. When they interfere they will generate a beat note which has the frequency of the difference between the two frequencies. I.E. if you send a sound beam with frequency $f_1=200\ \text{kHz}$ and another beam with $f_2=210\ \text{kHz}$, the frequency heard in the region where they combine will be $\Delta f-=f_2-f_1=10\ \text{kHz}$ which is in the audio band of humans.
There is an additional difficulty. You will need the sound to come out in a well-defined, narrow (collimated) beam, and this is not terribly easy to do. A typical speaker emits sound in all directions. There are many techniques for generating such beams, but one is to use a phased array.
How can you use this to make a person perceive the sound as coming from a specific point?
Sending Two Different Volumes to the Two Ears
What does it mean to perceive sound as coming from a specific location? Our ears are just microphones with cones which accept sound mostly from one direction (excepting low frequencies). A large part of the way we determine where the sound came from is just the relative volume in our two ears. So, you could use the interference effect described above with beams which are narrow enough that you can target each ear. By using two separate sets of beams targeting each ear with different volumes, you could make the person perceive the sound as coming from a specific location; at least as well as a 3D movie makes a person perceive images in 3D.
Hitting a Material Which Scattered the Sound Isotropically
The second method is to use the same interference effect, but this time combining the two beams at a point where a material scattered the sound waves in all directions. I'm going to be honest, I'm not sure how realistic such materials are, but lets assume they exist for now. If you did so, the two sound beams would be scattered with equal amplitude in all directions and the person you are trying to fool would percieve the sound as coming from this point. This method has the advantage of truly sounding to the person as if the sound came from that direction in all respects including reflections, phasing, etc.
In summary, the idea is definitely possible (maybe there are more ways than I've given), but it isn't as simple as the passage in the book makes it out to be.
Best Answer
You can find everything beautifully explained in this website on brass instruments: https://newt.phys.unsw.edu.au/jw/brassacoustics.html
A short version is that for ultrasonic (>20 kHz) vibrations you would need to increase the air pressure made by the mouth. That can be done by either putting more air through the mouth or making the volume through bigger. Alternatively, you could think of the "embouchure" being very tiny to make the pressure very high (I will do some quick math here and assume a linear relationship between the pressure and with a typical value of 1kPA to make a trumpet sound around 1kHz, maybe 20 times the pressure will lead to ultrasonic soundwaves).
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8313787/ Here you can see that the damage threshold of the tympani is around 30-100 kPa, so I would speculate that you would need to apply such an absurd amount of pressure to whistle ultrasonically that you would rupture your timpani at the same time.
I am neither a clinical expert or acoustics expert, so those numbers are just from "fun" calculations based on the linked evidence, but I hope there are more publications about what is the true range that a person can actually achieve "humanly".
EDIT: It has come to my attention from the comments that I did not clarify precisely what I meant with my analogy to brass instruments.
https://en.wikipedia.org/wiki/Physics_of_whistle
The wikipedia article is deemed to deep for it being a wikipedia article, but it has some interesting aspects of how whistling occurs in humans:
First, it is defined as a fluid dynamics problem:
Regarding human whistling, the wikipedia article is confusing but a study from Wilson et. Al. https://pubs.aip.org/asa/jasa/article-abstract/50/1B/366/745861/Experiments-on-the-Fluid-Mechanics-of-Whistling?redirectedFrom=fulltext points to a similar behavior when using a model based on a cylinder with different holes (for input and output).
While it is noted that the fact that I used the trumpet/brass as an example being a farfetched example, as I find further examples where the explanation is done more through fluid dynamics I think it is more difficult for me to make an informed assessment, but I still think that the amount of airflow to have supersonic sound out of a person is unlikely.