Sound waves propagate very similarly to how 'the wave' propagates at baseball stadiums:
http://www.youtube.com/watch?v=H0K2dvB-7WY
At some point something (your vocal cords, a piano string, a speaker) hit a bunch of air particles (atoms, molecules, it really doesn't matter). These particles in turn hit the particles next to them, these hit the ones next to them and so on. No pressure here is simply the absence of any particles, so nothing communicates the orders to move. This is like ' the wave' in that everyone communicates the motion of the wave of the person standing next to them, and if there is no one standing next to you, the wave ends with you. Hearing a sound is the last bunch of air particles next to your ear drum getting the instructions to vibrate which in turn vibrates your ear drum and your brain turns this response into the perception of sound.
It isn't possible to create an audio source in mid-air using the method you've described. This is because the two ultrasonic waves would create an audible source if the listener were standing at that spot, but those waves would continue to propagate in the same direction afterwards. You would need, as I point out below, some sort of medium which scattered the waves in all directions to make it seem as if the sound were coming from the point at which you interfered the two waves.
It is possible, however, to make the user percieve the sound as coming from a specific location, but it isn't as easy as the author makes it seem. I can think of two different ways. First of all, as described by @reirab, you can get audio frequencies by interfering two sound waves of high frequency. When they interfere they will generate a beat note which has the frequency of the difference between the two frequencies. I.E. if you send a sound beam with frequency $f_1=200\ \text{kHz}$ and another beam with $f_2=210\ \text{kHz}$, the frequency heard in the region where they combine will be $\Delta f-=f_2-f_1=10\ \text{kHz}$ which is in the audio band of humans.
There is an additional difficulty. You will need the sound to come out in a well-defined, narrow (collimated) beam, and this is not terribly easy to do. A typical speaker emits sound in all directions. There are many techniques for generating such beams, but one is to use a phased array.
How can you use this to make a person perceive the sound as coming from a specific point?
Sending Two Different Volumes to the Two Ears
What does it mean to perceive sound as coming from a specific location? Our ears are just microphones with cones which accept sound mostly from one direction (excepting low frequencies). A large part of the way we determine where the sound came from is just the relative volume in our two ears. So, you could use the interference effect described above with beams which are narrow enough that you can target each ear. By using two separate sets of beams targeting each ear with different volumes, you could make the person perceive the sound as coming from a specific location; at least as well as a 3D movie makes a person perceive images in 3D.
Hitting a Material Which Scattered the Sound Isotropically
The second method is to use the same interference effect, but this time combining the two beams at a point where a material scattered the sound waves in all directions. I'm going to be honest, I'm not sure how realistic such materials are, but lets assume they exist for now. If you did so, the two sound beams would be scattered with equal amplitude in all directions and the person you are trying to fool would percieve the sound as coming from this point. This method has the advantage of truly sounding to the person as if the sound came from that direction in all respects including reflections, phasing, etc.
In summary, the idea is definitely possible (maybe there are more ways than I've given), but it isn't as simple as the passage in the book makes it out to be.
Best Answer
Sound intensity is measured on the dB scale, which is a logarithmic scale of pressure. The "threshold of hearing" is given by the graph below:
which tells you (approximately) that 0 dB is about "as low as you go" - the "threshold of hearing". Note that sound signal drops off with distance - we will have to take that into account in what follows.
If you suddenly create a vacuum of a certain volume V, then air rushing in to fill the void will create a (negative) pressure wave traveling out - for simplicity's sake let's make the void spherical, and "listen" to the plop at a distance of 1 m (where the observer might be standing when the parchment disappears).
The problem we run into is that the pressure "step" is not a single frequency tone, it's in effect the sum of many frequencies (think Fourier transform) - so we would need to estimate what percentage of the energy is in the audible range.
That's hard to do, and we are talking about magic here - so I am going to simplify. A pressure level of 0 dB corresponds to $2\times 10^{-5} Pa$ - that's a really small pressure.
Parchment is thick - let's say 0.2 mm, or about double the thickness of conventional paper (a stack of 500 sheets is about 5 cm thick, so I estimate that at 0.1 mm per sheet). For a letter size piece of paper, 30 x 20 cm2, the volume is 12 cm3. If that was a sphere, that sphere would have a radius of ${\frac{12 cm}{(4/3) \times \pi}}^{1/3}$ = 1.4 cm.
If that sphere was suddenly "gone", an equal volume of air would have to rush in. At a distance of 1 m, the apparent pressure drop would be
$$\begin{align}\\ \Delta P &= \frac{r_1^3}{r_0^3}\times P_{ambient}\\ &=0.3 Pa\\ \end{align}$$
That is a Very Loud Pop - about 80 dB. Even if we argue that only a small fraction of this pressure ends up in the audible range there is no doubt in my mind you would hear "something".
So yes, you can hear that parchment disappearing. No problem. Even if some of my approximations are off by a factor 10 or greater. We have about 5 orders of magnitude spare.
AFTERTHOUGHT
If you have ever played with a "naked" loudspeaker (I mean outside of the enclosure, so something like this one from greatplainsaudio.com):
you will have noticed that the membrane moves visibly when music is playing - and as you turn the volume down, the movement becomes imperceptible while you can still hear the sound. That, in essence, is what you are doing here. The sound level you are getting would be similar to the sound level recorded when you move a loudspeaker membrane by by about 0.2 mm. I can guarantee you would hear it. Might be fun to do the experiment... I'll have to see if I have an old one lying around and I might try it myself.
UPDATE no time to play with loudspeakers, but thought I would do the calculation "what is the smallest movement of air that results in a sound the human ear can hear?".
Again this is going to be approximate. Let's assume an in-ear headphone with a 8 mm membrane coupling into a 3 mm ear hole. Just from the ratio of areas, we can see that sound levels will amplify - a movement of $x$ by the membrane will move the air in the earhole by $x\left(\frac{8}{3}\right)^2$. The equation that connects the movement of the membrane to the pressure produced is:
$$\Delta p = (c\rho\omega )s$$
In words: the change in pressure is the product of speed of sound, density of air, frequency, and amplitude of vibration.
Using $c = 340 m/s$, $\rho = 1.3\ kg/m^3$, $\omega = 2\pi\times1\ kHz$, and $\Delta p = 2\times10^{-5} Pa$ (the limit of audible sound at 1 kHz), we find that
$$s = 7.2\times10^{-12}m$$
And that's before I take the factor $\left(\frac{8}{3}\right)^2$ into account, which would lower the required amplitude to a staggering $1.0\times10^{-12} m$ - that's smaller than the movement of an atom.
You can see the derivation of the above at http://www.insula.com.au/physics/1279/L14.html and if you look for problem # W4 on that page you will find the calculation for a pressure level of 28 mPa at 1 kHz giving 11 nm displacement amplitude. Given that the limit of detectable sound level is about 1000x smaller, my numbers above are quite reasonable.
So the real answer to your "headline" question ("how much air needs to be displaced to generate an audible sound") is
The equivalent of one layer of atoms is more than enough
Impressive, how sensitive the ear is. And bats and dogs have even better hearing, I'm told.