I've been doing experiments related to this back in 1994, so it's going to take a bit of recall.
The idea of a flute is that you create standing waves, which have a frequency that depends on the (variable) geometry. The reason they're standing waves is because you fix specific boundary conditions. In particular, p=0 at an open end.
Now, consider that you have a standing wave in a flute, with an wavelength that is a fraction of your flute length. That means that you have several nodes in the middle. If you would open a key at a node, there would be no effect. If you'd open one near a node, the pitch would change slightly.
There are actually several different things going on. The reason you hear a single tone is because of resonance. The reason that it is usually a higher frequency tone has to do partly with the falloff in amplification with distance and partly (perhaps mainly) with the frequency response of the microphone/amplifier/loudspeaker.
Let's model the situation with $A$ representing the combination of the microphone, amplifier and loudspeaker, and the effect of the sound traveling through the air from the loudspeaker back to the microphone as $B$. In general $A(\omega)$ will be a function of the frequency $\omega$, and will be almost linear until the output signal gets to be large enough that it gets "clipped". That is, $A$ behaves as a complex multiplier where the magnitude of $A$ is the amplification and the angle of $A$ is the phase shift.
$B$ also behaves linearly, and depends mostly on the distance $r$ of the microphone from the loudspeaker. The signal will travel through the air at the speed of sound (~340m/s), and so there will be a delay of $\frac{r}{340\mathrm{m/s}}$, and $B$ will attenuate the signal by some factor, let's say $1/r^2$. Here's the public domain picture from wikimedia commons:
If the input is $x(t)$ and the output is $y(t)$ then we get $ y=A(x+By) $ or
$$
y(t) = \frac{A}{1-AB} x(t)\mathrm{.}
$$
Oscillations occur when $AB = 1$ (exactly). That means that the attenuation from $B$ is equal to the amplification by $A$ (including clipping) and the phase shift of traveling distance $r$ exactly cancels the phase shift from the amplifier.
Let's consider the simplest possible case where the microphone, amplifier and loudspeaker have no phase shift. Then the phase shift for the feedback is based completely on the distance between the loudspeaker and microphone. The speed of sound is about 340 m/s. That means that a 20Hz sound wave is about 17 meters long, so you'd need to be about 17 meters from the speaker to get 20Hz (and all the harmonics of 20Hz) to be in phase.
So if you put a microphone 17 meters from the speaker will you get 20Hz feedback? Probably not. You'll probably get one of the higher frequency harmonics. Why? Because the microphone, amplifier and loudspeaker are probably much better at amplifying midrange frequencies than they are at amplifying low and high frequencies.
Look at the frequency response for the Shure SM58 microphone (a popular and widely used voice microphone). It looks like this:
Most guitar amps look similar. dB is a logarithmic scale, usually $20 \log_{10}\frac{p}{p_\mathrm{ref}}$ so 5dB is about 1.8x and 10dB is 3.2x. Some harmonic in that range between about 3KHz and 8KHz is probably going to dominate.
Note that the phase shift from the amplifier matters a lot and we didn't incorporate it into our model. For fun I decided to try out some different distances on my home computer to see what would happen. Because of constraints due to wiring the farthest I could get my mike from the speaker was 1.8 meters. In multiple trials I got 383, 379 and 386 Hz. 1.8 meters corresponds to a minimum frequency of 189 Hz in our model. At 10cm I got about 3700 Hz (lowest harmonic at that distance in our model is around 3400) At 20cm I got about 2600 Hz and at 50cm I got about 400 Hz. The lowest harmonic at 50cm would be about 680Hz according to the model.
Best Answer
The short answer: "shadowing out" of high frequencies and passive resonance.
The detailed answer: The hands act as "low-pass" filters (they block out the higher frequencies). Almost everywhere has some form of background sound, but we tune it out. We notice changes in noise level and/or frequency. This is why you can "hear the ocean" through a seashell: you only notice "sound" when it changes as the shell is put near your ear.
Why are hands and shells a low pass filter? Sound waves have no trouble pushing/passing-through thin solid objects. But how thin is "thin" depends on the wavelength: there should be as much kg/m^2 of air in a 1-wavelength thick slab as there is object to pass through. Very long wavelength, low frequency sounds can penetrate through hands.
There is also a resonant chamber effect: the resonant frequency of an open bottle will drop as the neck gets narrower, which is roughly what happens when the hands move closer. The frequencies of external sounds that are at or near the resonant frequency will generate strong standing waves in the chamber.
Passive resonance is not amplification, it's energy transferral. A large volume of low-intensity background waves must give up it's energy in order to create a small volume of high-intensity chamber waves. A similar phenomena happens in an acoustic guitar string: the chamber "amplifies" sound by efficiently robbing energy from the string (via the vibrations in the wood) and giving it to the air.