Audio Transcription Time

Transcription of audio recordings should accurate, complete and provide an objective representation of the original spoken words. If you are considering hiring an audio transcription service, you should know that there are many factors that can affect the quality of the work and the time taken to complete it. In fact, planning the recording of interviews, meetings, and other events is crucial to ensure successful and timely transcription. One of the things that clients often want to know is how long will it take to transcribe an hour of audio/video. Let’s take a look at the key elements affecting audio transcription time and quality:

Quality of the audio file: The time taken to transcribe one hour of audio can range from 2-10 hours. Poor audio quality is one of the most important issues when it comes to transcription. A poor recording will have more ‘inaudibles’ and take much longer to transcribe. This can occur due to poor equipment, recording made without microphone, placement of the microphone, and noisy background. Use a good quality microphone if the environment is noisy. Using dynamic microphones will make a difference. Microphone placement is also important. See that it is placed directly in front of the speaker and pointed at them, and at a distance of not more than 2 feet. Background noise is another major concern and this can be addressed by using a quiet room for the recording, paying attention to acoustics, positioning the recording device at the right distance from the speakers. Generally, experienced transcription service providers will transcribe files even if there is some noise on them using audio editing tools.

File format: There are many audio file formats such as .MP3, .M4A, .AAC and more, and audio transcription companies accept files in any type of digital format. Nevertheless, the ideal file formats audio transcription services are uncompressed (PCM, WAV), lossy (MP3, WMA, AAC), and lossless (FLAC, WMA, ALC). Though high resolution, lossless audio files are very large, they are considered the best format for sound quality. MP4 is the standard file format if the recording is made on a phone. MP3 is the most popular format for transcription as it produces small, high quality sound files.

Speaker coherence: To provide accurate transcription, it’s important that the spoken words are clear. Some types of speech difficult to transcribe. The transcriptionist can face difficulties in interpreting speech if:

  • There are multiple speakers
  • If speakers talk over each other
  • If they speak quickly and have unusual patterns of speech
  • Sentences lack coherence
  • If they have heavy regional accents/unfamiliar accents
  • There are linguistic expressions that need to be translated
  • The speaker switches between two languages
  • The speaker has a low voice and frequently change pitch

The speech of small children can be also difficult to transcribe. If there are multiple speakers and they need to be identified, more time would be need to transcribe the file. In fact, automatic speech recognition (ASR) service Amazon Transcribe comes with a special feature called “speaker identification” to deal with this. This feature allows you to label each individual speaker in audio files with 2-10 speakers.

Subject of the recording: If the transcriptionist needs to research terms and spellings, it can take more time to convert the recording into text. For this reason, technical transcription and academic transcription may need more time to document. If the transcriptionist is unfamiliar with certain words like medical or legal terms and pronunciations, it can affect the time taken for transcription.

Special transcription requirements: Transcription time can be greatly affected if there are special transcription requirements such as true verbatim transcripts and timestamps. In verbatim transcription every verbal utterance in the audio recording has to be captured in text format, exactly the way it is delivered, including the pauses, non-verbal utterances and even silences. Verbatim transcripts are important when transcribing court documents, police interviews and legal documents as well as for researchers and academic analysts. In addition to getting all the words correct, the transcriptionist would need to document all non-verbal communication such as laughter, pauses, hand gestures, as well as include false starts, grammatical errors, and repetition of words. External sounds such as opening a door, people walking, or a conversation happening on the side would need to be noted with time stamps, if needed. Verbatim transcription should also include all false starts and fillers such as “um..,”, “ah..,” “you know”, etc. These provide a view about the thought process of the speaker.

If you are considering outsourcing your audio transcription work, it is important that you understand the different factors that can impact the quality of the output, the cost, and the time taken to complete it. Companies with extensive experience in providing general business transcription services are generally equipped to handle complex audio and provide documentation solutions if fast turnaround time.