UK biobank achieves landmark with release of world’s largest genetic dataset

Written by Katie McCool

Alliance for Genomic Discovery

UK Biobank’s achievement marks a new era in healthcare research, offering a wealth of real-world genetic information to shape the future of personalized medicine and transformative healthcare solutions on a global scale. 

In a significant milestone for healthcare research, UK Biobank has unveiled the largest-ever single set of sequencing data after five years of intensive effort, and an investment exceeding £200 million. This achievement involves whole genome sequencing of 500,000 volunteers, establishing a global benchmark for comprehensive health data. Approved researchers worldwide can now access this vast genetic dataset via the UK Biobank Research Analysis Platform (UKB-RAP). Stripped of identifying details, the dataset provides global researchers with unprecedented insights, and the potential for utilizing artificial intelligence and machine learning for precision medicine. Funded by Wellcome, UKRI, and leading biopharmaceutical companies, this project is poised to shape the future of global health research. 

Providing a resource for advancements in biomedicine and personalized health care, approved researchers can access the dataset via the secure, cloud-based UKB-RAP program. This marks the first time that a globally accessible resource with the required computing power and storage for analyzing this magnitude of data has been made available to researchers. With UK Biobank already having over 30,000 registered users from more than 90 countries, the impact of this will be far reaching. 

This extensive dataset, combined with information gathered by the UK Biobank over 15 years – including health records, imaging data, and lifestyle information – paints the most detailed portrait of human health to date. Professor Sir Rory Collins, Principal Investigator at the UK Biobank, emphasized this, stating: 

“This is a veritable treasure trove for approved scientists undertaking health research, and I expect it to have transformative results for diagnoses, treatments, and cures around the globe.” 

The landmark dataset is poised to fuel applications in artificial intelligence and machine learning. John Reed of Johnson & Johnson highlighted the potential for these advancements to streamline clinical development and progress toward tailored health care: This landmark dataset will enable us to leverage the power of artificial intelligence and machine learning for rapidly identifying novel disease targets and helping researchers predict how a candidate medicine might impact certain subpopulations of patients, based on their genetics. This could pave the way for more efficient clinical development and drive progress toward precision medicine.” 

Four biopharmaceutical companies – Amgen, AstraZeneca, GSK and Johnson & Johnson – provided the funding for this project in addition to Wellcome and UKRI. Following the completion of the sequencing, they collaborated to process and jointly analyze the genomes, utilizing cutting-edge technology, including the DRAGEN pipeline on AWS infrastructure. Illumina was then used to convert the raw data into a unified genetic dataset, enhancing the scientific significance of the data, aiding in the identification of rare genetic variations and facilitating comparisons with other large-scale population health studies. 

The four pharmaceutical companies plan to share summary statistical analyses, including genome-wide association results, streamlining access to critical insights and sparing researchers the costly and difficult task of analyzing raw data.

Want regular updates on the latest real-world evidence news straight to your inbox? Become a member on The Evidence Base® today>>>