[Physics] What are the similarities and differences between speech and music sounds

acousticsfrequency

From an acoustic perspective, I guess speech sounds are produced by varying/manipulating:

  1. resonance (shaping the cavity; something to do with harmony?)
  2. f0 (the length of the string/pipe/whatever)

However, in music, only (2) can be varied/manipulated. That is, the resonance/cavity doesn't change.

As you probably guessed, I'm neither a physicist, a musician, or a speech pathologist.
Can you help me understand better the similarities and differences of sound production for speech vs musical instrument?

Best Answer

Speech sounds can be either periodic, like "aaah," or nonperiodic, like "sh." Periodic means that the pattern repeats over and over with a certain frequency. Here's a graph of sound pressure versus time for me singing the vowel "ah" at a fixed pitch:

enter image description here

This kind of graph is referred to as a "time domain" representation of the sound, because it has time on the x axis. Because it repeats and therefore has a definite frequency, it also has a sense of pitch. A sound like "sh" doesn't have a clearly defined pitch. Usually the sounds that are periodic are the ones that they tell schoolchildren in the US are vowels, although, e.g., "r" is periodic (and is used as a vowel in Mandarin, for example).

Most musical instruments are designed to have a definite pitch, so they produce periodic waves. However, there are also unpitched instruments, such as most drums.

It's also possible to view a sound on a graph where the x axis is frequency rather than time. This is similar to what you'd see on a graphical equalizer, but with higher resolution. Here's a sample, which, IIRC, is also me singing "ah."

enter image description here

This is called a frequency-domain graph. Whenever the graph is periodic in the time domain, the frequency-domain representation looks like this: an evenly spaced "picket fence." The bottom frequency is called the fundamental. The higher ones, which are multiples of it, are called the harmonics. A musician would call these the overtone series. Although all these different frequencies are present, your ear-brain system hears them fused into a single sensation of tone; you can't "hear out" the overtones.

If you make a graph like this for a musical instrument that produces periodic waves, it will also be a picket fence. However, the pattern of intensity of the peaks will be different. If you look at the graph of me singing, you'll notice that the peaks have an envelope that starts high on the left, then goes down, comes back up, and goes down again. The humps in this envelope are called formants, and they're caused by resonances in the vocal tract. I believe the resonances are roughly analogous to Helmholtz resonances, which are what you get when you blow over the mouth of a beer bottle. Their frequency depends on parameters such as the length of the bottle's neck and the volume of the bottle; this is different from examples like a flute, where the frequency is almost entirely determined by the length of the air column.

The different vowel sounds have different formants. The formant structure is what your ear-brain system uses in order to detect that what it's hearing is human speech, that it's a vowel, and which vowel it is.

To change what vowel you're making, you do things like raising and lowering your tongue. The vocal tract contains several different resonating cavities, one of which is the mouth. Oversimplifying a lot, you could imagine that raising your tongue would decrease the volume of your mouth, and if it was acting like a Helmholtz resonator, the decreased volume would cause its resonant frequency to go up (like a smaller beer bottle). If you do this while continuing to sing the same note, the picket fence in the frequency domain will keep its peaks at the same frequencies, but we could imagine (in this simplified analysis) that one of the formants would move upward, so that the relative intensities of the peaks would change.

Related Question