Since 2006, Verilogue has recorded, transcribed, and archived over 150,000 healthcare based conversations between patients and their physicians in the exam room. With over 150 disease states across over 50 different physician specialties, we have amassed the largest healthcare dialogue database in the world. Our entire dataset, “The VeriCorpus,” has been structured for the development of new machine learning and artificial intelligence applications. The Vericorpus offers anonymous audio files associated with timestamped verbatim transcripts along with a series of data points from the patient chart data for model development and training purposes.
VERICORPUS
Verilogue’s A.I. Training Database ( ASR & NLP applications )
Over 1,000,000 Minutes of Audio
Each audio package is transcribed verbatim and timestamped to it’s associated audio (mp3 / mp4) file.
Across 12 Countries and 7 Languages
Conversations are captured in the natural language of participants and a verbatim transcription and translation is available for each recording.
Over 50 Physician Specialties
We have worked with physicians all over the world to capture and upload both simple and complex patient conversations.
Publications & Medical Journals Featuring Verilogue Data:
Finding Needles in the Right Haystack: Double Modals in Medical Consultations
While naturally-occurring double modals have been exceedingly rare in sociolinguistic interviews, our study represents the very first corpus investigation of double modals through a search of the right ‘haystack’: the nationwide Verilogue, Inc database of recorded and transcribed physician-patient interactions (~85 million words). As a vast source of potentially face-threatening negotiations, the Verilogue corpus provides the ideal speech situation in which to search for low frequency, non-standard syntactic features like the double modal.