There are actually several different things going on. The reason you hear a single tone is because of resonance. The reason that it is usually a higher frequency tone has to do partly with the falloff in amplification with distance and partly (perhaps mainly) with the frequency response of the microphone/amplifier/loudspeaker.
Let's model the situation with $A$ representing the combination of the microphone, amplifier and loudspeaker, and the effect of the sound traveling through the air from the loudspeaker back to the microphone as $B$. In general $A(\omega)$ will be a function of the frequency $\omega$, and will be almost linear until the output signal gets to be large enough that it gets "clipped". That is, $A$ behaves as a complex multiplier where the magnitude of $A$ is the amplification and the angle of $A$ is the phase shift.
$B$ also behaves linearly, and depends mostly on the distance $r$ of the microphone from the loudspeaker. The signal will travel through the air at the speed of sound (~340m/s), and so there will be a delay of $\frac{r}{340\mathrm{m/s}}$, and $B$ will attenuate the signal by some factor, let's say $1/r^2$. Here's the public domain picture from wikimedia commons:
If the input is $x(t)$ and the output is $y(t)$ then we get $ y=A(x+By) $ or
$$
y(t) = \frac{A}{1-AB} x(t)\mathrm{.}
$$
Oscillations occur when $AB = 1$ (exactly). That means that the attenuation from $B$ is equal to the amplification by $A$ (including clipping) and the phase shift of traveling distance $r$ exactly cancels the phase shift from the amplifier.
Let's consider the simplest possible case where the microphone, amplifier and loudspeaker have no phase shift. Then the phase shift for the feedback is based completely on the distance between the loudspeaker and microphone. The speed of sound is about 340 m/s. That means that a 20Hz sound wave is about 17 meters long, so you'd need to be about 17 meters from the speaker to get 20Hz (and all the harmonics of 20Hz) to be in phase.
So if you put a microphone 17 meters from the speaker will you get 20Hz feedback? Probably not. You'll probably get one of the higher frequency harmonics. Why? Because the microphone, amplifier and loudspeaker are probably much better at amplifying midrange frequencies than they are at amplifying low and high frequencies.
Look at the frequency response for the Shure SM58 microphone (a popular and widely used voice microphone). It looks like this:
Most guitar amps look similar. dB is a logarithmic scale, usually $20 \log_{10}\frac{p}{p_\mathrm{ref}}$ so 5dB is about 1.8x and 10dB is 3.2x. Some harmonic in that range between about 3KHz and 8KHz is probably going to dominate.
Note that the phase shift from the amplifier matters a lot and we didn't incorporate it into our model. For fun I decided to try out some different distances on my home computer to see what would happen. Because of constraints due to wiring the farthest I could get my mike from the speaker was 1.8 meters. In multiple trials I got 383, 379 and 386 Hz. 1.8 meters corresponds to a minimum frequency of 189 Hz in our model. At 10cm I got about 3700 Hz (lowest harmonic at that distance in our model is around 3400) At 20cm I got about 2600 Hz and at 50cm I got about 400 Hz. The lowest harmonic at 50cm would be about 680Hz according to the model.
So this is a discharge pumped laser.
At 18 Torr the laser lines will be quite narrow. Do you know how much bandwidth you have in your current output pulses? I think this is important.
You may have the needed bandwidth (~1/1us). If so, then I'll look from a crystal with anomalous dispersion for the compression.
If you do not have the bandwidth then you need to create the bandwidth (see https://en.m.wikipedia.org/wiki/Bandwidth-limited_pulse). I am thinking pressure broadening the spectral lines at 10um is the way to go. If you do not have the bandwidth then compression isn't possible.
( this is the beginning of a full answer. I didn't want the comments to keep expanding)
Best Answer
The hint given by the interviewer is a red herring. The limitation you're hearing has been part of the phone network since long before digital sampling had any part in the telephone system. And it applies even in a local phone call where the signal is never digitized.
It is related to the fact that the connection from a land-line phone in your house or office back to the "central office" of the phone company is essentially a continuous connection through a pair of wires. There's typically no active circuits such as amplifiers, repeaters, digitizers, or other electonics involved.
Given the technology of 100 years ago when the phone network was first designed, a connection of this length could really only carry a very limited bandwidth. The engineers who designed the network did numerous experiments to determine just what frequencies needed to be conveyed for people to understand each other's regular speech, and designed the network only to be sure those frequencies were transmitted. They didn't add any costly components to the system if they weren't needed to achieve this goal.
For example they might have used passive filters to "emphasize" high frequencies in circuits that were a bit longer (and so naturally tend to cut out the high frequencies) than average, or to cut off high frequencies in circuits that were shorter than average, to ensure all users get as much as possible the same quality of connections.
Later, when they started using multiplexing to connect multiple voice circuits through a single wire (for inter-city connections, for example), the limitted bandwidth allowed them to carry more connections on a single wire, and at that point the bandwidth limitation would have been deliberately enforced by filtering to ensure that conversations didn't cross-talk between each other.
Finally, when digital sampling and digital transmission was introduced into the network, the sampling theorem limitations discussed in the other answers came into play. Fortuitously, the bandwidth limitations introduced in the early days of analog telephone networks allowed digitization to be done at really low bitrates without degrading the signal quality below what it had been all along, and again this allows more conversations to be carried on a given wire in the network.
Edit
I want to summarize with a key point that I previously posted in a comment on another answer:
The digital sampling rate (and later, compression methods) used in digital telephony was chosen to match the characteristics of the analog phone network, not the other way around.