New Study Finds Errors in Automated Transcripts and Captions

by Julie Clements | Published on Nov 18, 2022 | Audio Transcription

Consumer Reports: Most Video Conferencing Apps have Captioning Mistakes

Researchers at Northeastern University and Pomona College who associated with Consumer Reports to test auto-captions in seven popular products found that auto-captions on popular products had several mistakes. This poses significant challenges for people who are deaf or hard of hearing, or whose first language is not English (www.consumerreports.org, August 2022).

The researchers evaluated captions on BlueJeans, Cisco Webex, Google Meet, Microsoft Stream, and Zoom. According to Consumer Reports:

There were mistakes in all of the programs, with some getting about 1 in 10 words wrong.
The results were worse in the case of second-language English speakers- even for those who were fluent. This means that an auto-caption user is less likely to be able to understand people whose native language isn”t English
Gender and first language status alone affected the variation in transcription mistakes (other factors like the speaker”s age, race and ethnicity, and speech rate had no impact).

The study also found considerable differences within each tested platform, with Webex having more mistakes than Google Meet. Consumer Reports said “Zoom”s “very best transcription had just two errors per 100 words, while at its worst the software mistranscribed nearly every third word”.

How the Companies Responded

The companies” spokespersons responded to Consumer Reports:

YouTube said that the study results matched up with the company”s “expectations for performance”. They said they were working to ensure that YouTube works better for everyone.

Microsoft confirmed the findings roughly align with its internal testing, which also revealed lower accuracy when transcribing men and second-language English speakers.

Zoom”s reply was: “We”re continuously enhancing our transcription feature to improve accuracy toward a variety of factors, including English dialects and accents.”

Google reported it was working to “improve the accuracy of live captions and translations so even more users can participate and stay engaged using Google Meet”.

Cisco said its auto-caption testing puts Webex ahead of two “best-in-class speech recognition engines”, but did not name these products.

Common Mistakes When Using Speech Recognition

Real-time machine-generated transcription and captions for videos are created by software that combines automatic speech recognition (ASR) echnology, machine learning technology (ML), and Artificial Intelligence (AI). Speech recognition instantly identifies the spoken words and converts them intotext on screen. Popular as it is for its speed and cost-effectiveness, accuracy is a matter of concern in automated transcription, as mentioned above. Factors that cause speech recognition mistakes include:

Speaker accents and dialects – voice recognition software that is trained to recognize American English speakers is likely to make mistakes when transcribing other types of English speakers. Consumer Reports found 10 transcription errors per 100 words for non-native English speakers.

Multiple speakers – The accuracy of speech recognition drops when there are multiple speakers present and being recorded.

Fast speech – The software may not be able to transliterate the speech of those who speak quickly or run words together, leading to missed words or phrases.

Complex jargon or phrasing – Every business sector has its own terminology and jargon that is not part of standard English, and an automated tool may not be able to translate them accurately into text.

Background noise – People talking in the background, music, the sound of traffic and other loud noises will affect the quality of automated transcript.
Speaker”s distance from the microphone – If the speaker positions the microphone too close to the mouth, the software may pick up jumbled speech.
Homonyms – Speech recognition software tends to misinterpret same-sounding words, for e.g. “their” and “there”.

Reviewing Automatic Captions and Transcripts can Ensure Accuracy

Transcripts and captions are essential to extend the accessibility and reach of videos, podcasts, and other multimedia content. While research like CR”s will encourage tech companies to improve their speech-recognition systems, partnering with an online transcription service provider is the best bet when it comes to ensuring accurate documentation of automated transcripts of work meetings, conferences, lectures and other important activities.

On its YouTube Help page, Google states: “These automatic captions are generated by machine learning algorithms, so the quality of the captions may vary. We encourage creators to add professional captions first. YouTube is constantly improving its speech recognition technology. However, automatic captions might misrepresent the spoken content due to mispronunciations, accents, dialects, or background noise. You should always review automatic captions and edit any parts that haven’t been properly transcribed”.

Recent Posts

How Interview Transcription Enhances Analysis

Interviews are an essential element of qualitative research. They help you explain, better understand, and explore the interviewees’ opinions, behavior, experiences, phenomena, etc. Asking open-ended questions ensures that in-depth information is collected. Interview...

How and When to Utilize Deposition Summaries

As an attorney, planning, preparing, conducting, and analyzing depositions are an essential and challenging part of the discovery process. While deposition transcription provides you with written records of witnesses’ sworn testimony, it can be a daunting task to...

How to Easily Convert Audio to Text

Transforming audio to text enhances access to the content. Whether you're a student taking lecture notes or just someone who needs to convert audio files to text for personal or business purposes, transcription is a valuable skill. Converting spoken words into...

Listen, Transcribe, Succeed: Industry-Specific Audio Transcription Solutions

The digital era has transformed the way businesses communicate with each other, their clients and the public. Audio and video solutions along with business transcription services have optimized communication, making your brand visible and attracting your target...

From Voice to Text: The Role of Transcription in Business Operations

Events such as sales meetings, conferences, training seminars, annual general meetings and other interactions have become increasingly important to chalk out a solid strategy for any organization’s goals. While these events get your message across to the intended...