As a provider of legal transcription service, as well as transcription for the academic, media, and finance sectors, we have clients frequently asking us about the advent of speech recognition software and its possibilities. Now this development is something we closely follow, fully aware of the more significant role transcription providers may have to play as editors and language specialists in the future if the technology becomes highly reliable. It is in this context that a recent article in pcmag.com becomes interesting. This post highlights how Microsoftâ€™s experimental speech recognition software achieved the lowest-ever recorded error rate for machine transcription almost matching that of a human translator.
The question is whether such software systems can ensure the 99% accuracy provided by, say for example, reliable university transcription services?
Microsoftâ€™s Speech Recognition Software Could Be Promising
Five years ago, the best speech recognition systems still had word error rates of 20 to 25%, whereas Microsoftâ€™s software achieved a WER (word error rate) of 5.9%. However, the company admits that its software cannot transcribe speech perfectly though this achievement is something great for neural network research. The neural language model used in the software learns not only the relationship between sounds, but also between words. So what does this mean? The language processing engine can distinguish synonyms. Suppose you use the word â€śhurry,â€ť it will look for â€śrushâ€ť â€śrunâ€ť â€śspeedâ€ť and other similar words.
Researchers have tested machine speech recognition against many modern means of communication such as texting and found that often the software used is able to produce quicker and more accurate results than humans. James Landay, a professor of computer science at Stanford University says â€śspeech recognition is something thatâ€™s been promised to us for decades but it has never worked very well. But we were noticing that in the past two to three years speech recognition was improving a lot.â€ť
How Practical Is Speech Recognition Software for Academics?
Given that speech recognition may evolve and be perfected in the future, how practical is it for some sectors such as the academic field where students who need transcripts of their lecture notes, dissertations and theses may not find this option very affordable? They would find academic transcription provided by a service provider such as a legal transcription service company very useful.
Academic research and dissertation/interview transcription form the basis of a studentâ€™s thesis or dissertation. This has made reliable dissertation transcription and interview transcription services much sought after by academics. Many colleges, universities and technical institutions require audio-to-text conversion of speeches, lectures, seminars etc. Recording interviews or discussions and collecting data are the key elements for thesis preparation. The transcripts prepared facilitate analysis and future reference to extract useful data and create information-rich research content. Transcription service providers with long-term experience in the field would ensure insightful transcription, maximum accuracy, and minimum turnaround time.
Why Professional Transcription Services Is a Better Option
In the current scenario, outsourcing academic transcription requirements is the more practical, reliable and time-saving option because speech recognition technology though promising still has the following disadvantages:
- It is not completely accurate and may misinterpret the spoken words. The software cannot always differentiate between homonyms such as â€śtheirâ€ť and â€śthere,â€ť and has problems with acronyms, technical words and slang usage.
- Voice recognition systems can have problems identifying accents, and coping up with speakers who speak very fast.
- Time involved may be more because you have to consider the time needed to review, edit and correct the errors. Training the software to understand your voice and speech patterns may take a long time. Moreover, it cannot identify multiple speakers or voices.
- When there is a lot of background noise, the system may not perform well. When there are other people speaking and other noise, it may lead to errors and mix-ups.