The Science of the Human Voice

There’s something that many people learn from their upbringing that leads to their success, that others don’t get the chance to learn.

Every sound that you hear, including the voice of someone you are talking to, has scientific elements that give it the characteristics you instinctively recognize and remember. Without missing a beat, humans have the ability to recognize and appropriately respond to the soft voice of a loved one, the sound of the wind rustling through the trees, a mosquito buzzing in your ear, or something scary like a gunshot. Of all of the sound-making devices on the planet, the human voice is perhaps the most versatile. The ability to control it is critical to effective human communication. Part of what we’re doing at Ox is breaking the human voice down to fundamental elements so that we can study its effect on communication in the workplace and the development of soft skills. 

A little on the science 😴

All sounds are made of up moving air pressure waves. Let’s use something we can see and are familiar with for an analogy, like the waves or ripples you see in water when you throw a stone in a lake. These air pressure “waves” travel through the air and ultimately move little hairs in your inner ear called stereocilia. The waves translate the physical movement of these hairs into an electrical signal that tells your brain very quickly what it understands about the sound. The signal does the best it can to figure out where the sound is coming from and recalls memories stored in the brain to figure out how to tell the rest of the body to react to the sound based on everything it knows and remembers. Sounds crazy right? It’s very real.  Hearing, perception, and auditory cognition are just a few of the many extraordinary phenomena that the human body and brain are capable of.

There’s a whole science to sound based on physics which plays a role in how humans make sounds and hear them. The characteristics of the air pressure waves mentioned above are what make every sound different and these can be precisely measured. Music is an easy and familiar application of the science of sound to think about. Most people can recognize when the singer in the group isn’t quite hitting the notes and is “out of tune,” when a drummer is behind the rest of the band and “off-beat,” when the bass player is playing “too loud,” or when the electric guitar player’s tone is “too harsh.” Let’s use those as mental images as we dive into the science of sound applied to the human voice, and how vocal delivery contributes to the development of soft skills desperately needed by companies today.

Are we out of tune? 🎶

The aspect of sound waves that determines if something is “in tune” has the fancy scientific name frequency. Sometimes used interchangeably for frequency because they are so similar is the term pitch. The only real difference is pitch is how a frequency is perceived by the brain based off of the sound interfacing with the environment around it, whereas frequency is the scientific measurement without the perception piece. Frequency is measured in Hertz (hz) and is given a number, like 440, which represents how many waves (remember like the stone hitting a lake) cycle through in a second. Many people around the world  refer to the most commonly used of these “pitches” by letters, A, B, C, D, E, F, G, and the “flats” and “sharps” between them, which correlate with the white and black keys on a piano. Every sound has a pitch that’s based on a frequency.  Every human voice can perform a number of different pitches, and very good vocal performers like your favorite singer or actress, has very good control over their vocal cords. Like singers, good business communicators have meticulous control over the pitch of their voice, and they use it to their advantage when communicating in the workplace. At Ox, we’ve observed the importance of pitch control in effective communication. We study verbal responses in our app to find what pitch arrangements good communicators use and don’t use in specific situations.

What am I? How do I make you feel? 🎺

Tone is how you can tell the difference between a clarinet, a  trumpet, and a voice that are performing the same note. Every sound is made up of not just one singular frequency, but of a rainbow-like myriad of frequencies stacked above it. These frequencies are like the ripples around the rock you threw in the lake, that make up what the air pressure waves experienced by the ear, and understood by the brain. The initial splash of the rock, or the most prominently heard pitch, is referred to as the fundamental frequency, whereas the other perceived frequencies (all of the ripples that extend out from the splash) are referred to as overtones. Specific overtones make up a special group called harmonics that tend to “sound good” together. Like pitch and loudness, the term for the human perception of the fundamental frequency, overtones, and harmonics is tone. The differing physical shape and structure that make the clarinet and the trumpet different make for the creation and movement of different groups of frequencies, overtones, and harmonics that make your inner ear hairs react differently thus perceiving them differently. Great business communicators tend to have physical and mental control over the tone of their voice and use it to their advantage depending on what they are trying to accomplish. Inexperienced communicators often don’t understand the extent to which their vocal performance and perceived vocal tone affects those around them. At Ox, we’re breaking this down, finding patterns from scientifically measured frequencies, including the fundamental frequency, in order to make correlations with desired business outcomes.

Can you hear me? 🦻

Everyone knows that if you can’t hear something on the radio, then you need to turn up the volume. Volume is the common word for the scientific term amplitude, which is the magnitude or strength (as it pertains to the wave shape, or more specifically the height of the peak) of the air-pressure wave, that is perceived by humans as loudness similar in nature to the way pitch is perceived. Very talented vocal performers and business communicators have very good control over their perceived loudness and use it to their advantage depending on the situation. At Ox, we’re analyzing the dynamic change in loudness within responses, trying to make sense of when changes in vocal loudness in business communications are appropriate, expected, or out of place and detract from the message. 

Are we off-beat? 🥁

The aspects of groups of sound waves that determine if a drummer is “on or off-beat,” are generally referred to as tempo, or the rate of how many sounds happen in a minute (beats per minute, bpm), and rhythm which is the specific pattern of sounds within a certain period of time. Both of these help us understand the placement of sounds over time.  At Ox, we’re measuring the tempo and rhythm of the voice over time in responses, finding patterns in the strategies effective communicators use and don’t use most. 

Tying it all together 👔

Whenever someone talks, they push air past their vocal cords that hang in the larynx (voice box) resulting in controlled air pressure vibrations that travel through their mouth and sinus structures as they move them to create words. These air pressure vibrations then travel through the air into another person’s or many people’s ear(s) to move their little stereocilia hairs to then be perceived by the brain and reacted upon. 

A good chunk of human development is learning how to flex the diaphragm and lungs to push air to vibrate the vocal cords as well as control the muscles in the face to make the facial structures needed to form the sounds that make up words. Some of this comes naturally, much is learned through imitation, and some kids struggle more than others in development and need specific practice, like speech therapy, to better control the muscles of their mouth and face, proving the “nurture” or teaching element of learning to control the human voice.

In parallel, or sometimes after the mechanics of speaking are mastered, comes the learning of language, which is the part that parents and educators focus on the most. The tonal (pitch, tempo/rhythm, loudness, and tone) elements of the human voice are rarely taught though, unless during development you were surrounded by talented communicators who taught you implicitly, or by example without you knowing, since most don’t think to explicitly or purposefully teach it. These tonal elements make up a fundamental piece of the skill set that’s the most in-demand by companies today referred to as soft skills.  You might have heard the old saying, “How you say something matters more than what you say.” The tonal elements are half of that and the choice of words are the other half. 

At Ox, we are leveraging the modern, scientific understanding of sound to build cutting-edge tech. We’re breaking down the human voice into understandable pieces to find patterns that we can use to understand and teach the “how you say it” part of human communication.  We’re pairing this with rich, job-specific audio training to teach topics in the void where parents and educators leave off, covering fundamental skills and using audio analysis, natural language processing, and machine learning to help people learn and practice “what to say” and “how to say it.”

References:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5323431/

https://www.ncbi.nlm.nih.gov/books/NBK11122/