What Is the Hardest Language in the World to Lipread?

There’s a whole lot we don’t know about vision and speech perception.

by Dan Nosowitz February 18, 2020

Lipreading might be more art than science. ALL ILLUSTRATIONS: MICHAEL GEORGE HADDAD FOR ATLAS OBSCURA

Lipreading has an almost mystical pull on the hearing population. Computer scientists pour money into automated lipreading programs, forensic scientists study it as a possible source of criminal information, Seinfeld based an entire episode on its possible (mis)use at parties. For linguists, it is a tremendously controversial and fluid topic of study. What are lipreaders actually looking at? Just how accurate is the understanding of speech on the basis of visual cues alone? What language, of the 6,000-plus distinct tongues in the world, is the hardest to lipread?

This last question, though seemingly simple, resists every attempt to answer it. Every theory runs into brick walls of evidence, the research is limited, and even the basic understanding of what lipreading is, how effective it is, and how it works is laden with conflicting points of view. This question, frankly, is a nightmare.

When we think of lipreading, generally we’re assuming that a lipreader is operating in complete silence, which is not, thanks to the popularity and technical improvements in cochlear implants and hearing aids, often the case. Seattle-based professional lipreader Consuelo González started to lose her hearing at a very young age, beginning at about four-and-a-half. “Over a period of about four years it went down to off the charts,” she says. (We conducted an interview over video chat, so she could read my lips.) Today she is profoundly deaf, and without hearing aids, she hears no sound at all. With them, she can pick up some environmental sounds, some tonality in speech, but not enough to understand what is being said to her—without seeing it, that is.

It turns out that we all do a bit of what González does—that is, use what we see to understand what is being said to us even if we can’t hear it well. “As far back as the 50s, there were classic studies that showed that people are better at perceiving speech in the presence of background noise if you can see the face of the talker,” says Matthew Masapollo, who studies speech perception at Boston University. There is a profound connection between the auditory and visual senses when understanding speech, though this connection is just barely understood. But there are all sorts of weird studies showing just how connected vision is to speech perception.

Let’s take the McGurk Effect, named for the researcher who discovered it accidentally in the 1970s. Say there’s a video of a person saying nonsense syllables: “gaga,” but the audio has been swapped with the same person saying “baba.” Subjects are asked to watch the video and identify which sound is being spoken. Bizarrely, most people perceive—using both eyes and ears—something else altogether: “dada.” The McGurk Effect shows just how strangely tied together sound and image are in speech perception, but it’s hardly the only study of its kind, Masapollo says.

There’s another study in which people holding a tube in their mouths, forcing their mouth to make an “oh” or “ooh” shape, are better at lipreading that vowel sound when it is spoken by other people, compared with people who aren’t holding tubes in their mouths. Another quirk of lipreading finds that one pair of vowel sounds is easier to lipread than another, even when the “lips” are just a sort of light-dot representation of a mouth. Turn that representation on a slight angle? The quirk is gone. Yet another found that it is easier to understand someone while watching a video of them speaking, even if the face is blurred or pixelated—maybe something to do with head movement or rhythm.

The McGurk Effect shows just how weird the relationship between sound and image are in speech perception.

There have also been many studies on pre-speech infants that suggest they’re gathering visual information from speakers around them. Muted videos played of people speaking the language with which a baby is familiar—the language that will become the baby’s native tongue—hold the baby’s attention for longer than videos of people speaking another language. There seems to be some kind of innate lipreading ability in all of us.

There are a couple of different theories of how this develops as we grow. One is that visual information is mostly redundant to the speech we hear. Another approach is that visual cues provide information, but that we just don’t really know how to measure and record it. To add to the complexity, there is an awful lot of variability between individuals. Some people who can hear generally let their innate lipreading ability lapse, relying on the audio component of speech, while others make more use of visual information all the time. But plenty of sources online claim that lipreading is only about 30 percent accurate.

But that doesn’t really apply to those with impaired hearing. González does not know American Sign Language. She relies entirely on lipreading and the little bits of sound she can detect with her hearing aid. Our conversation was very smooth, once I placed my laptop on a stationary surface so the camera would stop bobbing around. At no point during the conversation did I have any concerns about whether González could understand every word I said.

The commonly cited low success rate for lipreading isn’t based on people like González. Much of the research uses subjects with average hearing; Masapollo, for example, doesn’t work with deaf subjects. His work is less about lipreading than it is about the perception of speech. In fact, it’s weirdly difficult to find data about the effectiveness of lipreading as practiced by people who actually do it as part of their daily lives. People with average hearing test at around 10 to 12 percent accuracy, and there’s some suggestion that hearing impaired people are more like 45 percent accurate. But most of these studies are a little weird, because they tend to test accuracy by seeing whether people can identify individual phonemes—the sounds that make up words, like “gah” or “th”—or words. But, according to González, that’s not how lipreading works.

“We don’t lipread sounds,” she says. “Some people think we’re looking at phonemes and stringing them into words. Doesn’t work anything like that. We’re seeing words and putting them together in sentences.” Context is everything for a lipreader, because, well, context is everything for anyone trying to understand speech, regardless of which sense is being used. Whether you can trick a lipreader into confusing the words “ball” and “bull” is not reflective of real-world accuracy, because it’s unlikely that those words would be used in a way that makes it unclear which one is intended: “The dog wants you to throw his bull.”

Not many studies acknowledge any of this, though one from 2000, in the journal Perception & Psychophysics, did. In it, the researchers examined lipreading abilities among those with impaired hearing (IH, in the study) and average hearing. Crucially, it was assessing the ability to understand entire sentences rather than just phonemes. “Visual speech perception in highly skilled IH lipreaders is similar in accuracy to auditory speech perception under somewhat difficult to somewhat favorable listening conditions,” the researchers concluded. This tracks with what González told me: Lipreading, once you get really good at it, is more or less equivalent to understanding a speaker in a busy restaurant.

Lipreading is, in the deaf community, sometimes referred to as an “oralist” technique. Oralism refers to the emphasis on trying to interpret speech rather than on creating an alternate form of communication, namely sign language. Most developed countries have experienced a push to move away from oralism and toward sign language; there are now dozens of different sign languages around the world. It’s almost considered offensive, to some, to emphasize lipreading rather than a form of communication that does not put deaf people at a disadvantage.

Because of that, it’s generally the poor and often the illiterate who rely on lipreading. González is a real exception here, as someone who’s made a living with lipreading. In countries such as India and China, with high illiteracy rates among the deaf and reduced resources for teaching, oralism is still popular—but not very well documented.

There’s a body of literature on lipreading various non-English languages, but they don’t make things any clearer. The problem with trying to find the hardest language to lipread is that lipreading is exceedingly based on the individual, and most of the best lipreaders in each language—the ones who depend on it—don’t show up in studies. So figuring out how things differ from language to language requires an awful lot of guesswork. “It’s sort of hard to know objectively, because our experience with a particular language informs how we attend to certain things,” says Masapollo. “What might be hard for an English speaker might not be hard for a Japanese speaker. It’s highly experience-dependent.”

My first thought was to figure out which phonemes are especially hard to distinguish, under the assumption that a language that has more of those must be harder to lipread, right? There is data on this: A Swedish study from the Journal of Speech, Language, and Hearing Research in 2006 includes a chart of which phonemes were hardest to guess correctly when lipreading. Sometimes these phonemes are placed in groups along with similar-looking phonemes, called “visemes” (though not everyone thinks the divisions are as simple as that makes it sound).

According to that study, the hardest phoneme is one that doesn’t exist in English; it’s the sound between “th” and “d,” common in South Asian languages such as Hindi. (The word “Hindi” itself, pronounced properly, contains the sound.) In general, the hardest sounds to perceive are those for which very little happens on the lips: consonants like “d,” “g,” “n,” and “k,” all part of one viseme. (Naturally, one can often use context to distinguish “dot,” “got,” “not,” and even “knot.”)

But just because sounds are made without the lips doesn’t necessarily mean that they’re invisible to a lipreader—perhaps more accurately referred to as a “speech reader.” I sent González a video of a man speaking Jul’hoan, which is spoken in parts of Namibia and Botswana. The language has an incredibly large number of consonants, including dozens of clicks. She found these not difficult at all to detect: The lips may not be moving in quite the same way as an English consonant, but the movement of the tongue, cheeks, jaw, and muscles of the neck made them apparent to her. Of course, she doesn’t speak Jul’hoan, so she can’t say whether it would be easy or hard to read. Perhaps predictably, there are no studies on lipreading Jul’hoan, but it doesn’t seem impossible that a consonant-heavy language would be readable.

Another element of some languages that could present lipreading difficulty comes from tones. Many languages around the world are tonal, meaning that they rely on pitch to convey meaning, almost like a song. English is barely tonal at all (though we do raise the pitch at the end of a sentence to convey questions), but some languages have large numbers of complex tones that are vital to understanding them. Tones are produced by the larynx, and may not require any perceptible motion of the tongue, lips, cheeks, or jaw. Can those be perceived visually?

Cultural factors, from mustaches to covering one's mouth, can greatly impact the ability to lipread. — Cultural factors, from mustaches to covering one’s mouth, can greatly impact the ability to lipread.

Well, yes, we think. A study from 2008, published in the Journal of the Acoustical Society of America, tested the tonal understanding of speech reading in Mandarin, which has four tones, or five if you include the “neutral tone.” That study tested hearing subjects fluent in Mandarin, and found that, at first, they were awful at detecting tones in audio-free speech. But after only 45 minutes of training, they improved dramatically. Those sessions taught the subjects to look at the movements of the neck, chin, and head, because certain dips, extensions, and vibrations that can indicate tone turn out to be fairly easy to detect.

Still other languages heavily rely on another class of sounds made without much from the lips—namely, by using the larynx or creating a sound back in the throat. Some languages have a lot of guttural sounds, including Welsh, Hebrew, and Dutch, which all have a number of consonants made way in the back of the mouth cavity. The research is not very conclusive on whether these sounds are harder to read. Some studies suggest that guttural phonemes are easily confused or harder to detect. But they carry the same issue as “ball/bull.” Just because a skilled lipreader can’t see a sound doesn’t mean they don’t know it’s there.

What might actually make lipreading harder has nothing to do with language: It’s personality. “Some people are very shy, don’t move their faces or lips,” says González. “Not very much in the way of expressions. That makes it more difficult.” As we have seen, lipreading involves much more than the lips: The entire face, the head, even the rest of the body provide critical information. If someone is not very expressive, is not using all of the cues that a skilled lipreader employs, then, González says, they can be hard to read.

The studies looking at the overall landscape of visual language clues are a little tricky to interpret, and they point to a lot of broad generalizations. One study from Kumamoto University found that English speakers rely much more on visual information than Japanese speakers do. Another found that Japanese speakers display much less emotion in their facial expressions than Western Europeans do. Eye contact might be considered somewhat rude in Japan, further reducing the language’s conduciveness to speech reading. There’s also been some analysis of a Japanese tendency to cover the mouth with a hand, especially for women while laughing. There’s not really enough hard data on that to draw some kind of conclusion, but any form of concealing the lips would make an already difficult task much more difficult. That even applies to beards and mustaches. González says. “A mustache that comes down over your upper lip can affect being able to see what’s going on.”

So where does that leave us in our central question of which language is the hardest to understand from visual cues? “The short answer is, we don’t know,” says Masapollo.

This is understandable, though not very satisfying, but I have some theories. I nominate Hebrew as one, due to its combination of a large number of guttural sounds and a large number of speakers who are observant Jews and wear long beards. That Swedish study pointed to languages such as Hindi, Tamil, and Gujarati, all spoken in India, which also happens to be maybe the most mustachioed country on the planet. Japanese should be considered as well, thanks to a bunch of cultural stuff that may reduce the connection between vision and speech. I’ll toss in Chechen, because it has such a startling number of consonants and vowels, around 50 of each, including a whole mess of sounds that are made back in the mouth and throat. On the opposite end of the spectrum is Pirahã, a language spoken by a few hundred people in the Brazilian Amazon. Pirahã has about as few phonemes as a language can possibly have, which I suspect would make a lot of words look the same. It’s also a tonal language and may be substantially informed by rhythm; nobody’s quite sure. (It’s one of the most mysterious, controversial languages on the planet.)

There’s no good answer to the question, but then again, there are very few good answers to any questions about lipreading.

You can join the conversation about this and other stories in the Atlas Obscura Community Forums.