Deep real-world evidence: an interview with Ignacio H. Medrano


In this feature, we spoke to Ignacio H. Medrano, Founder and Chief Medical Officer of Savana (NY, USA) about the contribution Savana is making to real-world evidence and how the company is overcoming the challenges of utilizing real-world data in health research. Learn more about Savana here.

Ignacio H. Medrano is a consultant neurologist with training in healthcare management and experience in clinical research strategies (formerly responsible for +500 researchers). A Singularity University graduate, he is a founder at Mendelian in the UK (AI for diagnosis of rare diseases) and at Savana (Real World Evidence using AI on Electronic Medical Records). From the European Commission to the British Royal Academy of Science, Ignacio is an international speaker.

Could you tell us a little bit more about yourself and Savana?

I’m a clinician and consultant neurologist. Throughout 10 years of clinical practice I became increasingly aware that the huge volume and potential of health data, captured in the electronic health record (EHR) by clinicians in our everyday practice, remained largely trapped and untapped. So, in 2014 I founded Savana with the aim of unlocking the power of health data from EHRs to generate real-world evidence (RWE) for health research. Savana has developed a scientific methodology that applies artificial intelligence (AI) in the form of clinical natural language processing (NLP) to unlock all the clinical value embedded within the unstructured (free text) of de-identified EHRs. Savana does this in a way that is transparent and can be audited by any third party. The clinical variables generated through the Savana methodology can be trusted to support decision making in any clinical or pharmaceutical setting.

What is deep real-world evidence?

Approximately 80% of information captured in the EHR is in the form of free text, such as discharge reports, clinical notes, radiology reports, pathology reports etc. There is an enormous amount of valuable information, in the form of clinical variables, which is captured in these unstructured elements.

“This enhances RWE generation because more variables are available for analysis, leading to better confidence in prediction.”

Savana’s AI-powered solution automates the extraction of clinical variables from both the unstructured and structured elements of de-identified EHRs using machine learning (ML) and clinical NLP. This approach captures five times more clinical information at scale and frees up researchers to focus on analysis and interpretation of the data versus laborious manual data extraction.

Now we can ‘deep dive’ into all, not just a shallow few, of the clinical variables captured in the record associated with an individual patient. This enhances RWE generation because more variables are available for analysis, leading to better confidence in prediction. This is what we term ‘deep’ RWE and describes how Savana have advanced the generation of real-world evidence for clinical research.

What are the challenges of mining EHRs for real-world data and how does Savana overcome these?

When I speak with researchers and clinicians three questions always arise. How do we ensure patient data privacy? Who owns the data? Finally, given that we know the data in EHRs are often variable and incomplete, how do we know that the data we extract is of the high quality required for a particular study?

First, Savana puts patient data privacy at the core of all we do. Savana’s solution incorporates privacy-by-design and our proprietary patent-pending Natural Privacy methodology meets all the data privacy audits required by institutional review boards and research ethics committees. The actual data utilized in a study are not the original data from an individual patient. On the contrary, it is something we term ‘synthetic’ data. It is a replication of the clinical information and has the same statistical significance, but it is not identical to the information from individual patients. This probabilistic approach ensures that the conclusions we can extract are correct but at the individual level it cannot be suggested that we are using personal information without consent.

Second, ownership of the data is simple. Data are always ‘owned’ by the participating hospital site. Savana’s purpose and commitment is to further and accelerate health research, not to be a data broker. Therefore, Savana is the data processor. In undertaking this role, we are actually and contractually bound to meet all requirements of GDPR, HIPAA etc.

The third and final challenge is validating our NLP methodology to ensure studies are utilizing high-quality real-world data. As we all know EHRs are not perfect. So, how do we account for missing data?  Once data has been extracted and aggregated into one database, combining information from all the hospital sites participating in a particular study, Savana applies our novel NLP evaluation methodology to ensure that the transformation has been done properly for a specific group of concepts and variables; and that it is homogeneous across different sites, sources, hospitals and IT systems.

We have implemented real-world evidence studies across a wide range of therapeutic areas, at scale and across multiple hospital sites, countries and languages.

How does Savana achieve high quality real-world evidence and how is it different or unique to other approaches?

There are many hospitals and healthcare providers already applying NLP to medical records. Technology companies are also doing this. However, drilling down into specialisms such as oncology require a big effort. High-quality studies require reliability and confidence that the inference of one variable in one pathology is reliable across multiple sites, countries and languages to be able to extract conclusions and make confident decisions. The ability to scale-up RWE studies has also been difficult. It was a very manual process, too dependent upon on individuals to review and abstract the data in a slow and laborious process.

From conception, Savana has been creating tools and a methodology with which our team of 140 data scientists, clinicians, statisticians and epidemiologists can use to ‘industrialize’ the research study process. When we undertake a new pathology, for instance in oncology, and seek to investigate 50 or 100 variables, we can do this at scale across multiple sites, countries and languages. By applying our methodology, we have a tool that is able to predict the quality of the data we are providing every time we undertake a new study, allaying data quality concerns. We can do this very quickly – perhaps in just 3 months – and ensure that these studies stand up to peer review in high-impact journals.

The Savana clinical NLP methodology has recently been published in the Journal of Medical Informatics and demonstrates our ability to generate high-quality real-world data that is linguistically correct and homogeneous. It marks a significant advancement in the use of EHRs for deep RWE studies.

How do you picture the use of real-world evidence derived from health records developing in the future?

The use of RWE in life science is growing rapidly, leading to faster drug development, easier approvals and reduced therapy development costs. Savana adds value through our deep RWE approach.

For instance, the clinical narratives noted by clinicians in the EHR contain the most accurate information about effectiveness, adverse drug effects and other pharmacovigilance data in terms of quantity and quality. This information is a key component of Value Based Contracting. We apply Savana methodology to read all the accessible data (structured and unstructured) of patients prescribed with the drug or treatment of interest. This approach allows assessment of drug efficacy, safety data, and comparative effectiveness of targeted drugs against other treatments in real time, as well as a complete sociodemographic and clinical description of patients under treatment.

“From there, we can start asking and answering much deeper and richer questions and answers not limited to a single specialty or moment of care.”

Savana can also undertake ‘deep screening’ in order to identify a population of patients who would otherwise be hidden and ‘deep labelling’ where we clarify and where possible, broaden indications or guidelines for previously approved therapies and ensure treatment coverage.

Increasingly, we are also seeing how RWE is being applied within health systems to improve patient experience and outcomes. Patient interactions with the healthcare system are often complex and heterogenous even within the same healthcare system. Now, we are able to derive a complete set of data from every part of the patient journey in a real setting, such as the emergency department, internal medicine or intensive care. From there, we can start asking and answering much deeper and richer questions and answers not limited to a single specialty or moment of care. We can, for instance, begin to compare effectiveness and outcomes of the different and complex patient treatments and journeys.

The analysis of phenotype plays a key role in clinical practice and medical research, and yet phenotypic descriptions in medical publications are often imprecise. The emerging field of precision medicine aims to provide the best available care for each patient based on stratification into disease subclasses with a common biological basis of disease. Clinical notes of the EHR are essential for deep phenotyping and even for finding new phenotypes.

Many institutions are moving beyond the use of classical statistics and regression models toward examining risk scoring using machine learning (ML) to develop predictive models. However, the majority of the predictive models in use today only use structured data derived from International Classification of Disease codes. The Savana position is clear. When we deep dive into the EHR – accessing the original, unstructured data – we will extract all the clinical variables. As a result, the predictive models we create for screening of new patients for a pathology or treatment response are at a deeper level of detail and performance.


The opinions expressed in this feature are those of the interviewee/author and do not necessarily reflect the views of The Evidence Base® or Future Science Group.

This article is sponsored by Savana, click here to learn more >>>


In association with: