GP discussing with patient whilst showing them something on screen

Ethnicity Coding in English Health Service Datasets


The Covid 19 pandemic has demonstrated that the limited availability of ethnicity data and the quality of the data are reducing understanding of, ethnic inequalities, and the ability to identify effective responses. Effectively using currently available ethnicity data and improving the quality of the data are vital for identifying and addressing ethnic disparities in health.  

For this report we have analysed the quality and consistency of ethnicity coding within widely used health datasets, in order to inform users of ethnicity data and identify the actions needed to improve the quality of the underlying data. Along with providing insights for data users, the report sets out recommendations for policy-makers and organisations that generate and regulate health data. 

Key Findings  

We found that, overall, the proportion of health records containing the patient’s ethnicity code was high, with 87% of the over 17 million inpatient spells having a valid ethnic group recorded in 2019/20, a slightly higher proportion than for outpatient attendances (83% of over 96 million) and A&E attendances (86% of over 19 million). In addition, 8.5% of inpatient records had a code of ‘not stated’, which, although a permitted code, is not useful for analysis purposes. However, 8.8% of inpatient spells had an ‘other’ ethnic group coded. These proportions have increased since 2010/11, from 6.1% (not stated) and 7.2% (‘other’ ethnic groups) 

Importantly, records without ethnicity codes were not distributed evenly between ethnic groups. For most ages, specific minority ethnic groups were under-represented in health data when compared with national population estimates by ethnic group, while ‘other’ ethnicity codes were over-represented. Further, analysis of the consistency of coding for the same individual indicated that records of patients from minority ethnic groups were less likely to be recorded consistently over time or have a specific code. ‘Other’,’ not stated’, ‘not known’ and invalid codes were not uniformly distributed between ethnic groups. Excluding these missing ethnicity data from analysis is likely to introduce bias in the results, and impacts most on minority ethnic patients’ records. 

There were differences in coding according to patient and service characteristics, which indicate that there are systemic factors that impact on data quality. Furthermore, a third of patients with multiple contacts (as an inpatient, outpatient or A&E attendee) had inconsistent ethnicity codes. Inconsistent codes disproportionately impacted on minority ethnic groups.  


To improve the analysis of ethnicity using existing health data, we recommend the following: 

: • NHS Digital regularly publishes data on the quality of ethnicity coding within the Data Quality Maturity Index and this should also include the proportion of records coded as not known, not stated, an ‘other’ group and ‘any other ethnic group’.  

  • The UK Statistics Authority should review the quality of ethnicity coding within health statistics, in order to identify and make recommendations for improving the quality and consistency of data. 
  • Analyses of health care activity should routinely include the ethnic dimension, and consider and report on the quality of coding. 
  • Analysis methods to address data quality issues in analysis of ethnic differences should be clearly described and, where appropriate and feasible, the methodology developed by Public Health England for reassigning ethnicity in health records should be used. 

To improve the quality of source data on ethnicity in the future, we recommend the following:  

  • The Health Inequalities Improvement Programme at NHS England and NHS Improvement should work with NHS Digital and the NHS Race and Health Observatory on developing and implementing guidance for ethnicity coding in the NHS, in keeping with priority 3 of the NHS England and NHS Improvement operational guidance. Guidance needs to cover NHS-funded care, wherever this is provided, and include protocols for asking patients their ethnicity and recording it in health records, using the updated 2021 census categories. 
  • Integrated care system leaders should use their role to reduce inequalities to improve the quality of ethnicity coding in health records, ensuring that the updated guidance on ethnicity coding is implemented, and learning from local partners and spreading best practice in data quality and analysis. 
  • Boards and leaders of NHS providers and commissioners, and GP practices, should take ownership of the quality of ethnicity coding for their patients, ensure that the updated guidance is implemented, routinely monitor the quality of coding, identify how it can be improved, and put in place actions to achieve this. Once guidance on ethnicity coding is available, all health care providers should endeavour to record/update/ correct ethnicity coding in all patient records. 
  • The Care Quality Commission should incorporate the assessment of the quality of ethnicity coding in its inspections and ratings, and address independent providers’ poor-quality coding, taking action where the data suggest possible shortfalls and a failure to implement the updated guidance.