In this interview, Shirley Wang (Brigham and Women’s Hospital, MA, USA) discusses her presentation from ISPOR 2019 (18—22 May, New Orleans, LA, USA) on replicable and robust database evidence.
Please can you introduce yourself and your institution?
I am an Assistant Professor at Brigham and Women’s Hospital, Harvard Medical School (MA, USA), where a lot of my work is focused on research reproducibility and transparency.
What interests you about working on reproducible database research?
It is core to the scientific method; it’s going back to basics. We need to start here as a foundation for innovative and impactful ‘real world’ evidence from databases, which has the potential to inform healthcare policy, regulatory, reimbursement, clinical and other decisions.
What do you define as replicable and robust database research?
There are different kinds of reproducibility. For example, analytic reproducibility is achieved by being able to get the same result when you have the same data and same code used to conduct a study: you push the button and you get the same results. Another kind of reproducibility is direct replication, which involves being able to use the same data source and reported methods to independently recreate the study cohort and analysis findings.
“…it’s not clear to the vast majority of people….what needs to be reported and how to report in a way that reduces ambiguity”
Studies can be independently directly replicated only when the methods are clear enough that other researchers understand how the results were generated. This is important because without clarity about how the evidence was generated, in is difficult to assess validity or study quality.
Finally, there’s conceptual replicability, which focuses more on the robustness of evidence to alternative implementation choices. Here it is about understanding how and why evidence converges or diverges for investigators that may be asking the same research question, but conduct their studies using different data, different populations or alternative methods.
Currently, are most database studies robust and replicable?
I think there’s always room for improvement. The project we are working on right now, involving direct replication of 150 published database studies, is helping us identify the areas where there is the greatest need for improvement, areas that frequently have higher levels of ambiguity in reporting that make it difficult for us to closely replicate those studies.
What are those areas of greatest need?
I think it’s critical to be very clear about how you generate the analytic cohort or dataset upon which you could perform your analysis, because it is difficult to replicate an analysis result when the underlying cohort can’t be replicated. For database studies, we are using routinely collected healthcare encounter data that wasn’t collected for research purposes. The source data is organized by calendar time but analytic datasets are typically anchored in patient event time. How you get from the calendar time anchored source data to the patient event time anchored analytic data often involves a lot of steps, and you really need to understand those steps to be able to assess the validity of any given database study.
“…people need to be engaged [from different stakeholder organizations] to create common expectations, incentives and the culture to make changes in how database research is conducted”
What do you think the challenges will be in making database studies more transparent?
I think one of the challenges is that it’s not clear to the vast majority of people involved in generating or consuming evidence from database studies what needs to be reported and how to report in a way that reduces ambiguity. Having some guideposts there would be helpful. It’s about communication — making sure that your reporting is clear and unambiguous so that it can be heard and interpreted by the reviewer or the decision maker. Therefore, I think we need to be more clear about what to be clear about!
Do you see any sort of trends in the quality of data, for example the US vs internationally?
Three of the four data sources that we used were U.S-based and one of them was from the UK. In terms of data quality, they are all used a lot for research purposes, so they are well-known. However, it’s not just about the quality of the source data, it’s also the quality of the methods that are used to get your analytic data, and that can vary quite a bit depending on who’s doing the research.
In your opinion, who do you think should be providing guidance on this?
I think it’s got to be a collective effort, because people need to be engaged in it from different stakeholder organizations to create common expectations, incentives and the culture to make changes in how database research is conducted. That’s not going to happen in a vacuum or in one group alone: you really need to have every player at the table for this to happen.
“…it’s critical to be very clear about how you generate the analytic cohort or dataset upon which you could perform your analysis”
Is the future bright? Are we moving in the right direction?
I think so; I certainly hear a lot of excitement and agreement over the fundamentals. There are a lot of parallel initiatives that are going on. I think we are at a tipping point where the moment is ripe for change and a lot of things are coming together. I am hopeful that in the next few years, we will make some huge steps forward in terms of confidence in use of valid and reproducible ‘real world’ evidence from databases to provide insights for regulators, payers and other healthcare decision makers.