Skip links

Pioneering Oncology Research with NLP: The Insights AI Breakthrough


Download Case Study

In the quest to conquer cancer, data is as vital as determination. At Insights AI, we’re proud to have enabled a major leap in oncology research by helping our client develop a bespoke NLP model that stands as a testament to innovation, precision, and privacy.

Understanding the Challenge

Understanding the ChallengeOur client, a leader in healthcare, faced a daunting task: to process a vast array of oncology medical records while balancing meticulous data analysis with stringent privacy standards. The goal was clear – to refine oncology research within the regulatory frameworks.

Crafting the Solution

Our response was to implement a comprehensive strategy encompassing clinical data coverage, rigorous de-identification compliant with HIPAA, and the creation of robust annotation guidelines. These steps ensured the delivery of high-fidelity data annotation and the utmost respect for patient privacy.

Understanding the Healthcare Terminologies

To assist the client in developing a bespoke NLP model, we delved into the unique language and terminologies used in oncology. Our experts understood the nuance and context of oncological discourse

Data Collection: Navigating the Data Ocean

Our journey with this oncology project was akin to navigating an ocean of data. It was imperative to not only swim through this vastness but also to dive deep and surface the pearls of insight hidden within.

The Annotators: Unsung Heroes of Data Precision

Behind every data point we annotated, there was a team of unsung heroes. Our annotators, trained in the specific needs of oncology data, worked with precision to ensure that every tag, and every label was placed with intention. The domain experts effectively, identified and categorized crucial medical entities that were the lifeblood of oncological research. This attention to detail was critical in building a dataset that machines could learn from and doctors could rely on.

Oncology Clinical Note Statement

“Patient Jane Doe was diagnosed with Stage IIIB non-small cell lung cancer (NSCLC), specifically adenocarcinoma, on 03/05/2023. The cancer is located in the right lower lobe of the lung. It is classified as T3N2M0 according to the TNM staging system, with a tumor size of 5 cm x 3 cm. An EGFR exon 19 deletion was identified through PCR analysis of the tumor biopsy specimen. Chemotherapy with Carboplatin AUC 5 and Pemetrexed 500 mg/m² was initiated on 03/20/2023 and is to be administered every 3 weeks. External beam radiation therapy (EBRT) at a dose of 60 Gy in 30 fractions commenced on 04/01/2023. The patient’s treatment is ongoing, and there is no evidence of brain metastases on the recent MRI. The possibility of lymphovascular invasion is yet to be determined, and the patient’s tolerance for the full chemotherapy regimen remains uncertain.

Data De-identification: Ethics and Innovation

As we advanced in our NLP capabilities, we remained steadfast in our commitment to ethical standards. De-identifying data was just as important as analyzing it, ensuring that our pursuit of innovation never compromised patient privacy.

On [Date Pattern], at 11:00 am, Mr. [Patient Name], aged [Age], was admitted to [Medical Center Name] for a scheduled hip surgery, previously consulted by his primary care physician Dr. [Physician Name], and attended by [Physician Name] MD. During his stay, he was under the care of [Nurse Practitioner], N.P., and [Nurse Practitioner], R.N., with [Physician Name], P.A., also being consulted. His operation, conducted on the same day as admission, was successful with no complications reported. Following surgery, Mr. [Patient Name] was transferred to Room no. [Room Number], Floor no. [Floor Number], for recovery. During his brief stay, his medical records, including MRN [Medical Record Number] and Account [Account Number], were handled according to the standard protocols of [Nursing Home Name], his previous residence. He was discharged later the same day to the care of [Clinic Name] for further recuperation.

The Insights AI Impact

Through our advanced annotation techniques and NLP application to thousands of pages of oncology-related records, we delivered a highly refined dataset. This dataset has become the cornerstone of the client’s ongoing and future research efforts, aiming to enhance patient outcomes and care delivery efficiency.

A Testament to Our Capability

The success of this project underscores our ability to navigate complex medical data with precision. Our commitment to improving patient care outcomes and accelerating healthcare innovation has been recognized by our clients as instrumental in advancing their NLP capabilities within the oncology domain.


At Insights AI, we’re not just about data; we’re about driving the future of healthcare. As we continue to push the boundaries of what’s possible with AI and machine learning in oncology, we remain dedicated to providing solutions that are not only technologically advanced but also ethically sound and patient-centric. With each dataset, with each model, we are not just processing information; we are shaping the future of cancer care. As leaders in the field, we are excited about the possibilities that our NLP and AI capabilities unlock for healthcare professionals and patients alike.

Talk to an Expert

Share this recipe: