Skip links

Synthetic Healthcare Conversations for ASR

Enabling Ambient Technology Development through Synthetic Healthcare Conversations

Over 2000 Hours of audio data collected and transcribed in clinical setting

In the fast-evolving domain of Conversational AI, one of the notable applications is in the healthcare sector, where technology is utilized to streamline provider-patient interactions. Our client, a leading name in healthcare technology, approached Insights AI with a requirement to enhance their Automatic Speech Recognition (ASR) model to better comprehend and transcribe multi-speaker conversations in clinical settings. Due to privacy regulations, acquiring real-world dialogues was challenging; hence, the idea was to create and transcribe synthetic yet realistic interactions between healthcare providers and patients.


hours, or approximately 12,000 to
individual synthetic interactions of
0 +
minutes average duration


Our primary objective was to generate around 2,000 hours of audio recordings, translating to 12,000 to 24,000 meticulously crafted synthetic interactions, embodying a diversity in gender, age, accents, and medical roles. This comprehensive and authentic dataset, designed to mimic real-world clinical dialogues, was created while strictly adhering to privacy regulations such as HIPAA. The synthetic interactions served as a rich dataset, instrumental in training and refining our client’s ASR model, significantly enhancing its proficiency in handling real-world conversations within clinical settings.


Regulatory Compliance

Ensuring adherence to privacy laws such as HIPAA while creating realistic yet synthetic healthcare interactions can be challenging.

Data Authenticity and Diversity

Crafting synthetic interactions that accurately mimic real-world clinical dialogues while encompassing a wide range of scenarios, accents, ages, and medical roles demands a meticulous approach and deep domain knowledge.

Quality Assurance

Achieving a high level of accuracy in transcription, such as the targeted 95% Word Accuracy Rate (WER) and 90% Tag Accuracy Rate (TER), necessitates rigorous quality assurance processes.

Technical Capabilities

Ensuring the technical infrastructure, including the recording and transcription platforms, can handle the volume of data and maintain quality is a significant challenge.

Resource Recruitment & Training

Recruiting individuals with medical backgrounds for role-play, and ensuring they adhere to realistic scenarios while maintaining a natural conversation flow can be quite challenging. Additionally, training transcriptionists to adhere to stringent quality guidelines requires substantial effort and expertise.


Audio Collection & Transcription

  • Scenario Creation: Developed realistic scenarios mirroring common non-urgent conditions encountered in adult family-medicine practices, such as hypertension, diabetes, and pain management.
  • Role-Play: Recruited individuals with medical backgrounds to role-play as healthcare providers and patients, adhering to the provided scenarios and simulating real-world clinical conversations.
  • Recording: Utilized the Insights AI Work Mobile App for capturing audio, ensuring a diverse representation in terms of gender, age, accents, and professional backgrounds among the participants.

Validation and Transcription

Executed validation scripts to ensure the accuracy and quality of the audio files.

  • Transcriptions were carried out on the Bhasha platform, adhering to specific guidelines provided, and ensuring verbatim text transcription with precise diarization.
  • Annotated metadata including Speaker ID, Age, Gender, Native Language, and medical training/experience, which were critical for the client’s model training purposes.

Quality Assurance

Comprehensive quality checks by CQA & PMO ensured a Transcription Quality Target of 95% Word Accuracy Rate (WER) and 90% Tag Accuracy Rate (TER).

Data Delivery

  •  Structured the data in a clear, organized manner and delivered it in batches, along with detailed batch notes and culture directories.
  • Ensured all data, including audio files, transcriptions, and metadata, were accurately labeled and formatted as per the client’s specifications.

Feedback and Iteration

Established a robust feedback loop with the client to identify any deficiencies, ensuring corrections were made and a complete, accurate dataset was delivered.

Key Achievements

  • Successful collection and transcription of 2000 hours of synthetic healthcare interactions.
  • Prompt and accurate transcription with a remarkable accuracy rate, contributing significantly towards the client’s goal of enhancing their ASR model.
  • Demonstrated Insights AI’s capability in handling large-scale, complex projects with a meticulous approach towards quality and accuracy.


The meticulously executed project facilitated by Insights AI resulted in a rich dataset that significantly contributed to the enhancement of the client’s ASR model. The synthetic interactions created a realistic representation of clinical dialogues, aiding the client in achieving a more robust and reliable speech service for healthcare environments. Through a structured and well-coordinated approach, Insights AI ensured the successful delivery of a complex project within the stipulated timeframe, solidifying its expertise in managing large-scale conversational AI projects in the healthcare domain.

Our collaboration with Insights AI significantly advanced our project in Ambient Technology and Conversational AI within healthcare. Their expertise in creating and transcribing synthetic healthcare dialogues provided a solid foundation, showcasing the potential of synthetic data in overcoming regulatory challenges. With Insights AI, we navigated these hurdles and are now a step closer to realizing our vision of intuitive healthcare solutions.

Golden 5 Star

Let us know more about you!