New Truveta Language Model advances artificial intelligence unlocking of real-world electronic health record data for medical research

27 Apr 2023

Written by Linda Essex

Artificial intelligence and machine learning Health data News Outcomes research

Truveta launches the first large-language model specifically created to empower researchers to accurately study patient care and outcomes by harnessing the potential in electronic health records.

On 12 April 2023, healthcare data platform company Truveta published a news blog by their CEO Terry Myerson introducing the Truveta Language Model (TLM), the first large-language model specifically designed to accurately make electronic health record (EHR) data useful for research.

Speaking exclusively to The Evidence Base, Terry Myerson summarized the unique attributes of TLM: “Truveta Language Model is a large-language, multi-modal AI model for transforming EHR data into billions of clean and accurate data points for health research on patient outcomes with any drug, disease, or device. Unlike general large language models trained only on the public internet and inaccurate in the medical domain, TLM’s healthcare expertise is trained on the largest collection of complete medical records representing the full diversity of the United States. TLM also normalizes EHR data to maximize clinical accuracy and is trained without commercial bias, unlike claims data, which are created by normalizing EHR data to maximize revenue reimbursement. By using clinical expert-led AI to unlock the power of rich healthcare data, researchers can now ask and answer complex medical questions of a real-time, fully transparent view of US health.”

Claims data are the standard used in health research today, however, claims data are created by normalizing EHR data to maximize revenue reimbursement for encounters, medications, and labs and this results in commercial bias in research based on the data. In contrast, TLM has been developed to normalize EHR data to maximize clinical accuracy and is therefore trained free of commercial bias.

The unprecedented dataset from Truveta’s health system members, upon which TLM is trained, currently represents more than 80 million patient journeys from throughout the United States, including 5.5 billion diagnoses, 3.1 billion encounters and 2.4 billion medication orders, and is updated daily.

Truveta’s clinical expert annotators labelled tens of thousands of raw clinical terms from the EHR to train TLM to normalize healthcare data maximizing clinical accuracy and monitored the results of the model as it ran. Over the development period, the normalization accuracy of TLM steadily improved and now exceeds that of human clinical experts.

Truveta have demonstrated the superior clinical accuracy of TLM in comparison with GPT-4. Such general large language models are inaccurate within the medical domain due to being trained on the public internet, which contains no real medical records

TLM has also been trained to identify and normalize clinical concepts within unstructured clinician notes, applying reason over the entire medical record to account for changes over time. Truveta Data already includes more than 2.5 billion clinician notes and is growing every day.

Further information can be found detailed in the TLM white paper.

Click here to view the press release

Want regular updates on the latest real-world evidence news straight to your inbox? Become a member on The Evidence Base® today>>>

Previous article Next article

Click here to view the press release

Related articles

FDA seeks to clarify use of real-world evidence to support regulatory decision-making for medical devices with new draft guidance

Post-Market Drug Evaluation Program and the CoLab Network launched by CADTH

Prednisone and deflazacort in Duchenne muscular dystrophy: a patient perspective and plain language summary publication of the Cincinnati study

Prescribing health: the vital role of the pharmaceutical sector in improving health equity

What’s in a name? Towards standardizing the terminology of real-world data and real-world evidence

Target RWE real-world data registry for eosinophilic gastrointestinal disorders passes 1000 patient milestone

Target RWE real-world data registry for eosinophilic gastrointestinal disorders passes 1000 patient milestone