The reason you are hearing the train farther away is more consequence of the the geometry of different spaces than anything else.
It starts with an inversion layer of cold air clinging close to the ground. Just as glass bends light by making light move more slowly through it, an inversion layer of cold air bends sound because sound moves more slowly through cold air (the molecules move more slowly is why). So, this inversion layer behaves like the audio equivalent of a large sheet of glass covering the ground and guiding the sound away from the air above it. This is the same principle that allows a fully transparent optical fiber to capture light moving through it and transmit that light for many kilometers with very few losses. (@hwlau already noticed the inversion in the second of his three possible answers.)
On the ground side of the inversion layer, a fresh fall of snow further helps confine and preserve sound because it looks smooth to the long wavelengths of sound. So, even though snow thoroughly jumbles up the much shorter wavelengths of light, which is why it looks white, it looks very different and much more mirror-like to sound.
Put those two together -- diffraction on the top and reflection on the bottom -- and you have an example of two-dimensional sound dispersion. By way of contrast, sound dispersion from a train on a summer day that lacks any inversion layer and has sound-absorbing grass on the ground is an example of largely three-dimensional sound dispersion.
So, why is the dimensionality of the sound dispersion important?
Because sound (or any other radiation) disperses at a rate that is dependent on the number of dimensions of the space into which into which it is dispersing. If $L_n$ is the perceived loudness of the sound, $s$ is the distance to the sound source, and $n$ is the number of space dimensions into which sound is dispersing, the general equation for how loud the train will sound is:
$L_n = 1/s^{n-1}$
Notice that the lower the number of dimensions, the more slowly the sound disperses. (I discussed this same issue from a slightly different perspective a few months ago in my answer to this question about why objects look smaller when they are farther away.)
This equation explains why optical fibers can transmit light many kilometers without loss any significant of intensity ("loudness"). The dispersion space $n$ for optical fibers is $n=1$, so $L_1 = \frac{1}{s^{1-1}} = \frac{1}{s^0} = 1$. That is, there is no diminution of intensity. The audio equivalent would be a long tube, like the ones they used to use as intercoms in old houses (and still use in some playgrounds).
Now for your inversion layer case, $n=2$ and:
$L_2 = 1/s^{n-1} = 1/s$
But because your perception of train distance was tuned to $n=3$ space, you expected the sounds of the train to diminish at the much faster rate of:
$L_3 = 1/s^{n-1} = 1/s^2$
The analysis of how much farther away the train really is turns out to be trickier than it might seem. That's because the model I just described assumes that the energy of a 3D sound source can be compressed into a mathematically precise 2D plane. The physical world just doesn't work that way, since the sound energy in a 3D volume cannot be forced into a true 2D plane without creating infinitely high energy densities in the plane. Why? Well, pretty much for the reason that you cannot compress a 3D volume of air into an infinitely thin 2D plane without creating infinite mass densities. Crossing dimensionalities is often done a bit casually in physics, but one need to be careful with it.
So, in this case, instead of assuming a simple 2D plane, what you have to do is model the problem by using a "pancake" that more realistically represents the thickness of the inversion layer confining the sound. That allows sound intensities that "look" 3D in the immediate vicinity of the train, but then fade off more according to the dimensionality diffusion rules as distances increase to many times the thickness of the inversion layer.
So, everything from this point is obviously guesswork about what happened in your case, but a nice ballpark height for your inversion layer might be 10 meters. Approximating again, that 10 meters also becomes the "unit of equality" for distance from the train at which the sound of the train is perceived as the same in both cases. This approximation should work reasonably well for any more-or-less point source of sound coming from the train, in particular its whistle. So, call this unit of distance $s_u$ for hearing a similar loudness for the whistle $s_u = s_w = 10 m = 0.01$ km.
Alas, it gets messier. The sound of the train itself is anything but a point source, since you may be able to hear wheel-on-rail sounds for very long lengths, such as a kilometer for a long train. That also messes up the model and adds even more complexity in the form of orientation and sound delays. So, I'm going to wrap all of that complexity up into a single huge approximation and say that for a long train, the sound of the all the train wheels on all the track sounds "about the same" for anyone within a kilometer of the train as it passes by, inversion layer or not. So, the length unit for assessing how train track noises changes over distance becomes $s_u = s_t = 1$ km.
The equation now has to be altered slightly so that these "sounds the same" units $s_u$ are factored in:
Actual: $L_2 = s_u/s^{n-1} = s_u/s$
Perceived: $L_3 = s_u/s^{n-1} = s_u/s^2$
Solving for the $s$ distances in terms of loudness:
Actual: $s_2 = s_u/L$
Perceived: $s_3 = \sqrt{s_u/L}$
The error factor $e$ for how far off your distance estimate was then is:
$e = \text{(actual)}/\text{(perceived)} = s_2/s_3 = \frac{s_u/L}{\sqrt{s_u/L}} = \sqrt{s_u/L}$
For the train whistle, $s_u = s_w = 0.01 km$. With $L$ in km:
$e_w = \sqrt{s_u/L} = \sqrt{0.01/L} = 0.1/\sqrt{L}$
For the track noise from the entire train, $s_u = s_t = 1 km$. With $L$ in km:
$e_t = \sqrt{s_u/L} = 1/\sqrt{L}$
So, finally, a couple of very rough estimate examples are possible.
Assume the train is actually about $L = 16$ km away. In that case, the whistle sounds like it is $e_wL$ km away, or:
$e_wL = 16e_w = 16(0.1/\sqrt{16}) = 0.4$ km away.
In the same case, the train track sound will appear to be $e_tL$ km away, or:
$e_tL = 16e_t = 16/\sqrt{16} = 4$ km away.
So, not only are sounds moving through a winter inversion layer highly deceptive for estimating distances, they can be deceptive in different ways at the same time! A point source such as the train whistle may well sound like it is even closer than the train as a whole -- and both perceptions will sound way, way closer than the actual distance.
Best Answer
Every object moving through air has a wave front, like a boat moving through water. The faster the object is moving, the larger the wavefront and its amplitude.
When two wavefronts run into each other, they add their amplitudes. The wavefronts are slightly behind and to the sides of the fronts of the trains. When two trains pass by each other at high speed, the wavefronts collide between the trains, adding together.
Also, as the fronts of the trains pass each other, they are compressing the air between them, which also adds to the amplitude of the wave (it's a compression wave after all), something that cannot occur without another train, or possibly a very close tunnel wall. The sides of the trains, as they are moving quickly, invoke the Bernoulli principle, resulting in low pressure between the trains to contrast with the high pressure around the fronts of the trains.
If the trains are moving fast enough and are the right shape, then a sound similar to a slapping sound will be produced between the trains as they pass. If they are each moving at about half the speed of sound, then the resulting slapping sound would actually be similar to a sonic boom.
This occurs only between the trains, because it is only between the trains that these circumstances occur. On a platform, you won't hear it because the train is blocking the sound. If you could stand between the closely passing trains (somehow), you would hear the sound. If you're on a platform between the trains though, it won't work because the trains are too far apart; their wavefronts don't reach very far.
Here is a link to a slide show that simplifies a long paper written on this very topic. If you would like the whole paper, you can find it here.
Below is an image depicting the solutions to the pressure equations. It is simple to see how the high pressure in front of the red train would push on the windows and body of the blue train. As the front of the res train passes, the quick transition from medium to high to low pressure would likely snap the windows and outer body of the blue train enough to produce the aforementioned "slapping" sound on the inside of the car. This sound is accentuated by the spaces between cars, and especially the front of the train.
To be clear, this paper is probably the best explanation you're going to get on this topic. I've done my best to understand where this described "slapping sound" would come from based on that. The only way I can think of getting more information is to do a more complex simulation or a slow motion video of the outside of the train as the other passes by. It would be cool to see the ripple across the body of the train as the other train passes.
EDIT: To clarify, there are two sources of sound here: 1 - The fronts of the trains hitting each other. And 2 - The pressure difference creating a ripple on the outside of the body of the trains.