Researchers from the Massachusetts Institute of Technology (MA, USA) have developed a protocol for determining the accuracy of models that predict clinical risks of patients. The method could help clinicians avoid prescribing either ineffective or unnecessarily risky treatments.
Clinicians frequently employ risk models to calculate clinical risks, such as mortality risks, of patients, for example, after they have experienced an adverse cardiovascular event. The model’s predictions can help inform the most appropriate treatment regimen for a given patient. In this novel study, researchers from the Massachusetts Institute of Technology (MIT; MA, USA) describe a protocol for determining the accuracy of risk model predictions for given patients, which could help clinicians avoid prescribing ineffective or unnecessarily risky treatments.
Risk models generate mortality risk scores for patients based on phenotypic parameters such as age. These models, often created by applying machine-learning algorithms to patient datasets, can be invaluable; however, they invariably make inaccurate predictions for some patients, which can lead to clinicians implementing ineffective, or even risky, treatment regimens.
Senior study author Collin Stultz (MIT) explained: “Every risk model is evaluated on some dataset of patients, and even if it has high accuracy, it is never 100% accurate in practice. There are going to be some patients for which the model will get the wrong answer, and that can be disastrous.”
In this study, a team of researchers from MIT, the MIT-IBM AI Lab and the University of Massachusetts Medical School (all MA, USA) detail a novel method for determining the accuracy of risk model predictions for specific patients, which could allow for the streamlining of the treatment regimens so that the greatest proportion of patients are treated with the most effective treatments.
The protocol generates a risk model ‘unreliability score’, which is based on a comparison of the predicted risk scores by two models trained on the same dataset. The unreliability score can range from zero to one; the closer the unreliability score is to unity, the less accurate the prediction generated by the risk model.
In their paper published in Digital Medicine, the researchers illustrate how their method may be employed by applying their protocol to a commonly used risk prediction model — the Global Registry of Acute Coronary Events (GRACE) risk score — for determining patients’ mortality risks within 6 months of their experiencing an acute coronary syndrome.
The GRACE registry encompasses data on more than 40,000 individuals. In the study, researchers demonstrate that patients within the GRACE registry for whom the risk model predictions have associated high unreliability scores form a group for which the risk model is less accurate.
Stultz stated: “…if you look at patients who have the highest unreliability scores — in the top 1% — the risk prediction for that patient yields the same information as flipping a coin. For those patients, the GRACE score cannot discriminate between those who die and those who don’t. It’s completely useless for those patients.”
Importantly, the method allows for the unreliability of a risk prediction model to be evaluated without the researchers accessing the original dataset with which a model was trained. This is critical, as Stultz noted, as “…there are privacy issues that prevent these clinical datasets from being widely accessible to different people.”
The team is currently working to produce a user interface that clinicians could routinely employ to evaluate whether a given patients GRACE score is reliable. The researchers also hope to, in the future, fundamentally improve the reliability of risk model predictions by making it easier to retrain models with data that are more representative of a wider patient population.
Myers PD, Ng K, Severson K et al. Identifying unreliable predictions in clinical risk models. NPJ Digit Med. doi:10.1038/s41746-019-0209-7 (2020) (Epub ahead of print);