Insights from the largest study of patient ethnicity data in England

27 Feb 2024

Written by Katie McCool

Phase 1 of the Ethnicity, Equity, and AI study utilized electronic health records (EHRs) to analyze patient ethnicity data, aiming to create a comprehensive research resource while highlighting the importance of accurate recording within the NHS. Focused on patients affected by the COVID-19 pandemic, encompassing about 93% of England’s patients, the study examined how ethnicity is categorized within the NHS, identified contradictions and gaps in ethnicity records, and determined which populations are most likely to lack ethnicity data.

Within the NHS, ethnicity data is classified into two main categories: SNOMED-CT concepts and NHS ethnicity codes. SNOMED offers detailed breakdowns with 489 distinct categories, while NHS ethnicity codes provide broader classifications with 19 categories. Healthcare professionals record patient ethnicity using one of these systems in EHRs; however, simplifying ethnicity into broader groups can compromise research accuracy by overlooking specific ethnic groups with unique health profiles. Despite this, researchers often simplify ethnicity into six categories to increase sample sizes, which can conceal crucial insights. Detailed analysis of ethnicity is vital for shaping healthcare policies and treatments to reduce inequities effectively. PHI Oxford underscored the importance of this, stating:

“There is much more detailed ethnicity data available than the typical six groups the population is broken down into, and using it will make health research, healthcare technologies and ultimately health care better for all.”

The study utilized three linked datasets:

Primary care data: General Practice Extraction Service (GPES) Data for Pandemic Planning and Research (GDPPR)
Hospital admissions data: Hospital Episode Statistics for admitted patient care (HES-APC)
Mortality information: Office for National Statistics (ONS)

Recorded ethnicities included White (77.3%), Asian/Asian British (9.8%), Black/Black British (3.6%), Other Ethnic Groups (3.6%), Mixed (2.2%), and Unknown ethnicity (3.2%). After linking with HES-APC, the percentage of individuals without an ethnicity recorded decreased from 16.7% to 6.1%.

Among its key findings, the study revealed that 1 in 10 patients had missing ethnicity records, often due to selecting ‘prefer not to say’. Those with missing data were typically younger, male, and had fewer recorded co-occurring conditions. The Public and Patient Involvement (PPI) highlighted that assuming patient ethnicity disregarded patient preferences, potentially reinforcing existing biases in healthcare data. As part of this project, the ‘Be Proud of Your Ethnicity’ campaign was launched to encourage patients to voluntarily provide their ethnicity, aiding accurate research. This is particularly important as, “when recorded, ethnicity is often inaccurately coded, especially for groups other than the predominant group(s) in a given population.”

The study revealed that patients in England self-identify across 250 different ethnicity sub-groups within the SNOMED concepts, with the patients in this dataset not utilizing all 489 categories.

Inconsistencies in ethnicity codes were noted in approximately 12% of patient profiles, possibly reflecting changes in individuals’ perception of their ethnicity or the use of outdated NHS ethnicity codes based on older census data. To enhance representativeness for research, the study suggested potential improvements in mapping detailed SNOMED codes to NHS ethnicity codes, which can improve how representative these 19 codes are for use in research.

Accurately identifying ethnicity in healthcare data is crucial, as, “biased ethnicity knowledge could potentially lead to biased healthcare decision-making and to patients receiving inappropriate or no care.” The study emphasized the necessity for researchers to include ethnicity in their analyses and highlighted the value of SNOMED concepts in capturing ethnic diversity comprehensively. Inclusivity was prioritized, with active involvement from a panel of PPI members and a broader stakeholder group representing diverse ethnicities.

Click here to view the press release

Want regular updates on the latest real-world evidence news straight to your inbox? Become a member on The Evidence Base® today>>>

Previous article Next article

Click here to view the press release

Related articles

OM1 expands real-world evidence in dermatology by launching a dataset for hidradenitis suppurativa

Using artificial intelligence and electronic health records to predict cardiovascular outcomes

Target RWE real-world data registry for eosinophilic gastrointestinal disorders passes 1000 patient milestone

Industry news round-up: updates from April 2024

HealthTree Foundation introduces HealthTree Research Hub, providing multiple myeloma researchers with real-world data

A multistakeholder perspective on computable phenotypes for generating real-world evidence: can they really be standardized and reused?

Target RWE real-world data registry for eosinophilic gastrointestinal disorders passes 1000 patient milestone