[Physics] Why can’t the human voice produce a Shepard tone

acousticsbiophysicsfrequencyperceptionwaves

Audio of a shepard tone on youtube.

So what is a Shepard tone?

A Shepard tone, named after Roger Shepard, is a sound consisting of a
superposition of sine waves separated by octaves. When played with the
base pitch of the tone moving upward or downward, it is referred to as
the Shepard scale. This creates the auditory illusion of a tone that
continually ascends or descends in pitch, yet which ultimately seems
to get no higher or lower. (wikipedia).

A computer simulated Shepard tone goes on and on and on……. It never ends, literally. We feel (our brain perceives) that the amplitude or the frequency or whatever is increasing gradually but after some time we feel that that tone is repeating again, starting from same point. So, the frequency of the tone changes periodically like a sine wave.

But why we, the human voice cannot produce that tone? How hard we try, we cannot produce. This may be due to exhaustion or the capacity of lungs. Our voice seems to get saturated after a certain limit further which we cannot produce the sound. Why? If the frequency of the tone changes periodically like a sine wave, we should be able to continue producing the tone from where we started it. But no, this does not happen. Why?

PS- my terminology may be wrong. So, feel free to edit it.

Best Answer

The human voice box produces a fundamental frequency and its harmonics because the mechanism is like that of a relaxation oscillator. However, we have limited control over the relative amplitude of the harmonics (we do have some - that is how we change the "color" of a tone we sing, and the sound of vowels).

In order to produce the Shepard scale, you need to be able to control the relative amplitude of the different harmonics - especially the ratio of the lowest two harmonics. To a limited extent we do this when we change the vowel that we sing - with the "oo" sound having few "really high" harmonics, while the "ah" has lots. For example, from the hyperphysics site we get this image:

enter image description here

showing that there is a lot or harmonic content in the voice. But it's not "evenly distributed" - so if you were to drop by an octave, you are creating a sound that is sufficiently different that you don't really get the feeling that you have an "eternal" scale.

I suspect the most important problem is that you would want to re-introduce the lowest harmonic with a slowly increasing amplitude, so that the note "returns to the lower range" without ever appearing to jump there. But the mechanism of the vocal chords is too simple to allow it.

Incidentally, when sopranos sing very high notes, many people lose the ability to distinguish what vowel they are singing since the harmonics are further apart, and the ear distinguishes between vowels by estimating the shape of the frequency envelope in the range up to a few kHz; when there are very few harmonics in that range, the shape cannot be determined. The "high C" (C7) has a frequency of 2093 Hz, so there might be just a couple of harmonics available to figure out the sound. That makes vowels in the highest register hard to distinguish.