Because the frequency of a sound wave is defined as "the number of waves per second."
If you had a sound source emitting, say, 200 waves per second, and your ear (inside a different medium) received only 150 waves per second, the remaining waves 50 waves per second would have to pile up somewhere — presumably, at the interface between the two media.
After, say, a minute of playing the sound, there would already be 60 × 50 = 3,000 delayed waves piled up at the interface, waiting for their turn to enter the new medium. If you stopped the sound at that point, it would still take 20 more seconds for all those piled-up waves to get into the new medium, at 150 waves per second. Thus, your ear, inside the different medium, would continue to hear the sound for 20 more seconds after it had already stopped.
We don't observe sound piling up at the boundaries of different media like that. (It would be kind of convenient if it did, since we could use such an effect for easy sound recording, without having to bother with microphones and record discs / digital storage. But alas, it just doesn't happen.) Thus, it appears that, in the real world, the frequency of sound doesn't change between media.
Besides, imagine that you switched the media around: now the sound source would be emitting 150 waves per second, inside the "low-frequency" medium, and your ear would receive 200 waves per second inside the "high-frequency" medium. Where would the extra 50 waves per second come from? The future? Or would they just magically appear from nowhere?
All that said, there are physical processes that can change the frequency of sound, or at least introduce some new frequencies. For example, there are materials that can interact with a sound wave and change its shape, distorting it so that an originally pure single-frequency sound wave acquires overtones at higher frequencies.
These are not, however, the same kinds of continuous shifts as you'd observe with wavelength, when moving from one medium to another with a different speed of sound. Rather, the overtones introduced this way are generally multiples (or simple fractions) of the original frequency: you can easily obtain overtones at two or three or four times the original frequency, but not at, say, 1.018 times the original frequency. This is because they're not really changing the rate at which the waves cycle, but rather the shape of each individual wave (which can be viewed as converting some of each original wave into new waves with two/three/etc. times the original frequency).
The human voice box produces a fundamental frequency and its harmonics because the mechanism is like that of a relaxation oscillator. However, we have limited control over the relative amplitude of the harmonics (we do have some - that is how we change the "color" of a tone we sing, and the sound of vowels).
In order to produce the Shepard scale, you need to be able to control the relative amplitude of the different harmonics - especially the ratio of the lowest two harmonics. To a limited extent we do this when we change the vowel that we sing - with the "oo" sound having few "really high" harmonics, while the "ah" has lots. For example, from the hyperphysics site we get this image:
![enter image description here](https://i.stack.imgur.com/6mHsI.gif)
showing that there is a lot or harmonic content in the voice. But it's not "evenly distributed" - so if you were to drop by an octave, you are creating a sound that is sufficiently different that you don't really get the feeling that you have an "eternal" scale.
I suspect the most important problem is that you would want to re-introduce the lowest harmonic with a slowly increasing amplitude, so that the note "returns to the lower range" without ever appearing to jump there. But the mechanism of the vocal chords is too simple to allow it.
Incidentally, when sopranos sing very high notes, many people lose the ability to distinguish what vowel they are singing since the harmonics are further apart, and the ear distinguishes between vowels by estimating the shape of the frequency envelope in the range up to a few kHz; when there are very few harmonics in that range, the shape cannot be determined. The "high C" (C7) has a frequency of 2093 Hz, so there might be just a couple of harmonics available to figure out the sound. That makes vowels in the highest register hard to distinguish.
Best Answer
This effect is known as inharmonicity, and it is important for precision piano tuning.
Ideally, waves on a string satisfy the wave equation $$v^2 \frac{\partial^2 y}{\partial x^2} = \frac{\partial^2 y}{\partial t^2}.$$ The left-hand side is from the tension in the string acting as a restoring force.
The solutions are of the form $\sin(kx - \omega t)$, where $\omega = kv$. Applying fixed boundary conditions, the allowed values of the wavenumber $k$ are integer multiples of the lowest possible wavenumber, which implies that the allowed frequencies are integer multiplies of the fundamental frequency. This predicts evenly spaced harmonics.
However, piano strings are made of thick wire. If you bend a thick wire, there's an extra restoring force in addition to the wire's tension, because the inside of the bend is compressed while the outside is stretched. One can show that this modifies the wave equation to $$v^2 \frac{\partial^2 y}{\partial x^2} - A \frac{\partial^4 y}{\partial x^4} = \frac{\partial^2 y}{\partial t^2}.$$ Upon taking a Fourier transform, we have the nonlinear dispersion relation $$\omega = kv \sqrt{1 + (A/v^2)k^2}$$ which "stretches" evenly spaced values of $k$ into nonuniformly spaced values of $\omega$. Higher harmonics are further apart. We can write this equation in terms of the harmonic frequencies $f_n$ as $$f_n \propto n \sqrt{1+Bn^2}$$ which should yield a good fit to your data. Note that the frequencies have no dependence on the amplitude, as you noted, and this is because our modified wave equation is still linear in $y$.
This effect must be taken into account when tuning a piano, since we perceive two notes to be in tune when their harmonics overlap. This results in stretched tuning, where the intervals between the fundamental frequencies of different keys are slightly larger than one would expect. That is, a piano whose fundamental frequencies really were tuned to simple ratios would sound out of tune!