The market research we conduct at Verilogue is a bit unique. It relies on the linguistic analysis of actual recorded conversations between physicians and their patients/caregivers. While having the audio is great for detecting tonality and sentiment, the real meat behind our analyses requires a highly accurate, verbatim transcript.How could we get the transcripts we needed at a reasonable cost?
Early on in our inception, circa 2006, this presented a bit of a problem. The holy grail for us was finding an automated speech recognition (ASR) system that would just spit out what we needed. But due to the multi-speaker, conversational nature of the audio, the oft-present background noise in the exam room, and the fact that our application only produced single-channel audio, the ASR vendors we brought in to peddle their wares had a tough go of it.
In fact, the highest degree of accuracy any vendor exhibited was a paltry 47.4%. The inability to accurately differentiate between speakers made the numbers even worse. Because we require our transcripts to be at least 98% accurate, it seemed as if our dream of a fully automated, speech-to-text solution just wasn’t in the cards.
So, what did we do? We decided to build the best transcription workflow management system we could. We created LogueWorks. It’s a crowd-sourced system that facilitates transcription, translation and multimedia processing, serving not only internal Verilogue needs, but also the needs of our external clients. The outputs are highly accurate texts of complex conversational audio, produced in a quick, cost-efficient manner.
Up to the moment I posted this, we’ve processed over 17,000 hours of audio. That’s 708 days’ worth of de-identified (per HIPAA guidelines), conversational audio between physicians and patients with accurate, speaker-annotated transcripts that are time-aligned to the audio. That’s a lot of data.
Since our last round of ASR evaluations, there have been quite a few advancements. Just a few months ago, Microsoft claimed their researchers have reached human parity in conversational speech recognition. Just recently, IBM reached a speech recognition industry record of 5.5% word error rate. No doubt there are others out there.
The debate between who’s better, ASR or traditional transcription, continues. Do you work with ASR? Comment below- we’d love to hear from you!
Ryan Orr joined Verilogue as the third employee in 2006. As Vice President of Application Development, he is responsible for the design, development, testing, and maintenance of Verilogue’s suite of enterprise applications. Ryan has over 10 years of experience in the pharmaceutical market research industry and over 15 years of experience in software development and technical architecture.