The reason you are hearing the train farther away is more consequence of the the geometry of different spaces than anything else.
It starts with an inversion layer of cold air clinging close to the ground. Just as glass bends light by making light move more slowly through it, an inversion layer of cold air bends sound because sound moves more slowly through cold air (the molecules move more slowly is why). So, this inversion layer behaves like the audio equivalent of a large sheet of glass covering the ground and guiding the sound away from the air above it. This is the same principle that allows a fully transparent optical fiber to capture light moving through it and transmit that light for many kilometers with very few losses. (@hwlau already noticed the inversion in the second of his three possible answers.)
On the ground side of the inversion layer, a fresh fall of snow further helps confine and preserve sound because it looks smooth to the long wavelengths of sound. So, even though snow thoroughly jumbles up the much shorter wavelengths of light, which is why it looks white, it looks very different and much more mirror-like to sound.
Put those two together -- diffraction on the top and reflection on the bottom -- and you have an example of two-dimensional sound dispersion. By way of contrast, sound dispersion from a train on a summer day that lacks any inversion layer and has sound-absorbing grass on the ground is an example of largely three-dimensional sound dispersion.
So, why is the dimensionality of the sound dispersion important?
Because sound (or any other radiation) disperses at a rate that is dependent on the number of dimensions of the space into which into which it is dispersing. If $L_n$ is the perceived loudness of the sound, $s$ is the distance to the sound source, and $n$ is the number of space dimensions into which sound is dispersing, the general equation for how loud the train will sound is:
$L_n = 1/s^{n-1}$
Notice that the lower the number of dimensions, the more slowly the sound disperses. (I discussed this same issue from a slightly different perspective a few months ago in my answer to this question about why objects look smaller when they are farther away.)
This equation explains why optical fibers can transmit light many kilometers without loss any significant of intensity ("loudness"). The dispersion space $n$ for optical fibers is $n=1$, so $L_1 = \frac{1}{s^{1-1}} = \frac{1}{s^0} = 1$. That is, there is no diminution of intensity. The audio equivalent would be a long tube, like the ones they used to use as intercoms in old houses (and still use in some playgrounds).
Now for your inversion layer case, $n=2$ and:
$L_2 = 1/s^{n-1} = 1/s$
But because your perception of train distance was tuned to $n=3$ space, you expected the sounds of the train to diminish at the much faster rate of:
$L_3 = 1/s^{n-1} = 1/s^2$
The analysis of how much farther away the train really is turns out to be trickier than it might seem. That's because the model I just described assumes that the energy of a 3D sound source can be compressed into a mathematically precise 2D plane. The physical world just doesn't work that way, since the sound energy in a 3D volume cannot be forced into a true 2D plane without creating infinitely high energy densities in the plane. Why? Well, pretty much for the reason that you cannot compress a 3D volume of air into an infinitely thin 2D plane without creating infinite mass densities. Crossing dimensionalities is often done a bit casually in physics, but one need to be careful with it.
So, in this case, instead of assuming a simple 2D plane, what you have to do is model the problem by using a "pancake" that more realistically represents the thickness of the inversion layer confining the sound. That allows sound intensities that "look" 3D in the immediate vicinity of the train, but then fade off more according to the dimensionality diffusion rules as distances increase to many times the thickness of the inversion layer.
So, everything from this point is obviously guesswork about what happened in your case, but a nice ballpark height for your inversion layer might be 10 meters. Approximating again, that 10 meters also becomes the "unit of equality" for distance from the train at which the sound of the train is perceived as the same in both cases. This approximation should work reasonably well for any more-or-less point source of sound coming from the train, in particular its whistle. So, call this unit of distance $s_u$ for hearing a similar loudness for the whistle $s_u = s_w = 10 m = 0.01$ km.
Alas, it gets messier. The sound of the train itself is anything but a point source, since you may be able to hear wheel-on-rail sounds for very long lengths, such as a kilometer for a long train. That also messes up the model and adds even more complexity in the form of orientation and sound delays. So, I'm going to wrap all of that complexity up into a single huge approximation and say that for a long train, the sound of the all the train wheels on all the track sounds "about the same" for anyone within a kilometer of the train as it passes by, inversion layer or not. So, the length unit for assessing how train track noises changes over distance becomes $s_u = s_t = 1$ km.
The equation now has to be altered slightly so that these "sounds the same" units $s_u$ are factored in:
Actual: $L_2 = s_u/s^{n-1} = s_u/s$
Perceived: $L_3 = s_u/s^{n-1} = s_u/s^2$
Solving for the $s$ distances in terms of loudness:
Actual: $s_2 = s_u/L$
Perceived: $s_3 = \sqrt{s_u/L}$
The error factor $e$ for how far off your distance estimate was then is:
$e = \text{(actual)}/\text{(perceived)} = s_2/s_3 = \frac{s_u/L}{\sqrt{s_u/L}} = \sqrt{s_u/L}$
For the train whistle, $s_u = s_w = 0.01 km$. With $L$ in km:
$e_w = \sqrt{s_u/L} = \sqrt{0.01/L} = 0.1/\sqrt{L}$
For the track noise from the entire train, $s_u = s_t = 1 km$. With $L$ in km:
$e_t = \sqrt{s_u/L} = 1/\sqrt{L}$
So, finally, a couple of very rough estimate examples are possible.
Assume the train is actually about $L = 16$ km away. In that case, the whistle sounds like it is $e_wL$ km away, or:
$e_wL = 16e_w = 16(0.1/\sqrt{16}) = 0.4$ km away.
In the same case, the train track sound will appear to be $e_tL$ km away, or:
$e_tL = 16e_t = 16/\sqrt{16} = 4$ km away.
So, not only are sounds moving through a winter inversion layer highly deceptive for estimating distances, they can be deceptive in different ways at the same time! A point source such as the train whistle may well sound like it is even closer than the train as a whole -- and both perceptions will sound way, way closer than the actual distance.
It isn't possible to create an audio source in mid-air using the method you've described. This is because the two ultrasonic waves would create an audible source if the listener were standing at that spot, but those waves would continue to propagate in the same direction afterwards. You would need, as I point out below, some sort of medium which scattered the waves in all directions to make it seem as if the sound were coming from the point at which you interfered the two waves.
It is possible, however, to make the user percieve the sound as coming from a specific location, but it isn't as easy as the author makes it seem. I can think of two different ways. First of all, as described by @reirab, you can get audio frequencies by interfering two sound waves of high frequency. When they interfere they will generate a beat note which has the frequency of the difference between the two frequencies. I.E. if you send a sound beam with frequency $f_1=200\ \text{kHz}$ and another beam with $f_2=210\ \text{kHz}$, the frequency heard in the region where they combine will be $\Delta f-=f_2-f_1=10\ \text{kHz}$ which is in the audio band of humans.
There is an additional difficulty. You will need the sound to come out in a well-defined, narrow (collimated) beam, and this is not terribly easy to do. A typical speaker emits sound in all directions. There are many techniques for generating such beams, but one is to use a phased array.
How can you use this to make a person perceive the sound as coming from a specific point?
Sending Two Different Volumes to the Two Ears
What does it mean to perceive sound as coming from a specific location? Our ears are just microphones with cones which accept sound mostly from one direction (excepting low frequencies). A large part of the way we determine where the sound came from is just the relative volume in our two ears. So, you could use the interference effect described above with beams which are narrow enough that you can target each ear. By using two separate sets of beams targeting each ear with different volumes, you could make the person perceive the sound as coming from a specific location; at least as well as a 3D movie makes a person perceive images in 3D.
Hitting a Material Which Scattered the Sound Isotropically
The second method is to use the same interference effect, but this time combining the two beams at a point where a material scattered the sound waves in all directions. I'm going to be honest, I'm not sure how realistic such materials are, but lets assume they exist for now. If you did so, the two sound beams would be scattered with equal amplitude in all directions and the person you are trying to fool would percieve the sound as coming from this point. This method has the advantage of truly sounding to the person as if the sound came from that direction in all respects including reflections, phasing, etc.
In summary, the idea is definitely possible (maybe there are more ways than I've given), but it isn't as simple as the passage in the book makes it out to be.
Best Answer
Air nearest the water is cooler than air farther above the water. As sound travels slower in cool air, if sound waves from warmer air enter the cooler layer they are refracted downward toward the ear of someone in a boat.
If the water is calm, its flat surface allows sound waves to travel unobstructed and to reflect from the surface. Instead of dissipating in tall grasses and other obstructions on land, sound waves retain their coherence for longer distances over calm water. Sound waves also may reflect from calm water's surface, bouncing up to the ear.
Addtionally, if you are sitting quietly in a still boat on calm water, there is little or no ambient noise to interfere with sound waves coming to you from a distance. So sounds from shore may seem to be more clear, which you may confuse with loudness.