New Truveta Language Model advances artificial intelligence unlocking of real-world electronic health record data for medical research

Written by Linda Essex

Large language Model

Truveta launches the first large-language model specifically created to empower researchers to accurately study patient care and outcomes by harnessing the potential in electronic health records. 

On 12 April 2023, healthcare data platform company Truveta published a news blog by their CEO Terry Myerson introducing the Truveta Language Model (TLM), the first large-language model specifically designed to accurately make electronic health record (EHR) data useful for research. 

Speaking exclusively to The Evidence Base, Terry Myerson summarized the unique attributes of TLM: “Truveta Language Model is a large-language, multi-modal AI model for transforming EHR data into billions of clean and accurate data points for health research on patient outcomes with any drug, disease, or device. Unlike general large language models trained only on the public internet and inaccurate in the medical domain, TLM’s healthcare expertise is trained on the largest collection of complete medical records representing the full diversity of the United States. TLM also normalizes EHR data to maximize clinical accuracy and is trained without commercial bias, unlike claims data, which are created by normalizing EHR data to maximize revenue reimbursement. By using clinical expert-led AI to unlock the power of rich healthcare data, researchers can now ask and answer complex medical questions of a real-time, fully transparent view of US health.” 

Claims data are the standard used in health research today, however, claims data are created by normalizing EHR data to maximize revenue reimbursement for encounters, medications, and labs and this results in commercial bias in research based on the data. In contrast, TLM has been developed to normalize EHR data to maximize clinical accuracy and is therefore trained free of commercial bias. 

The unprecedented dataset from Truveta’s health system members, upon which TLM is trained, currently represents more than 80 million patient journeys from throughout the United States, including 5.5 billion diagnoses, 3.1 billion encounters and 2.4 billion medication orders, and is updated daily.  

Truveta’s clinical expert annotators labelled tens of thousands of raw clinical terms from the EHR to train TLM to normalize healthcare data maximizing clinical accuracy and monitored the results of the model as it ran. Over the development period, the normalization accuracy of TLM steadily improved and now exceeds that of human clinical experts.  

Truveta have demonstrated the superior clinical accuracy of TLM in comparison with GPT-4. Such general large language models are inaccurate within the medical domain due to being trained on the public internet, which contains no real medical records 

TLM has also been trained to identify and normalize clinical concepts within unstructured clinician notes, applying reason over the entire medical record to account for changes over time. Truveta Data already includes more than 2.5 billion clinician notes and is growing every day.  

Further information can be found detailed in the TLM white paper.

Want regular updates on the latest real-world evidence news straight to your inbox? Become a member on The Evidence Base® today>>>