Ethnicity, comorbidity, socioeconomic status, and their associations with COVID-19 infection in England: a cohort analysis of UK Biobank data

Objectives: Recent data suggest higher COVID-19 rates and severity in Black, Asian, and minority ethnic (BAME) communities. The mechanisms underlying such associations remain unclear. We aimed to study the association between ethnicity and risk of COVID-19 infection and disentangle any correlation with socioeconomic deprivation or previous comorbidity. Design: Prospective cohort. Setting: UK Biobank linked to Hospital Episode Statistics (HES) and COVID-19 tests until 14 April 2020. Participants: UK Biobank participants from England, excluding drop-outs and deaths. Main measures: COVID-19 infection based on a positive PCR test. Ethnicity was self-reported and classified using Office of National Statistics groups. Socioeconomic status was based on index of multiple deprivation quintiles. Comorbidities were self-reported and completed from HES. Analyses: Multivariable Poisson analysis to estimate incidence rate ratios of COVID-19 infection according to ethnicity, adjusted for socioeconomic status, alcohol drinking, smoking, body mass index, age, sex, and comorbidity. Results: 415,582 participants were included, with 1,416 tested and 651 positive for COVID-19. The incidence of COVID-19 was 0.61% (95% CI: 0.46%-0.82%) in Black/Black British participants, 0.32% (0.19%-0.56%) in other ethnicities, 0.32% (0.23%-0.47%) in Asian/Asian British, 0.30% (0.11%-0.80%) in Chinese, 0.16% (0.06%-0.41%) in mixed, and 0.14% (0.13%-0.15%) in White. Compared with White participants, Black/Black British participants had an adjusted relative risk (RR) of 3.30 (2.39-4.55), Chinese participants 3.00 (1.11-8.06), Asian/Asian British participants 2.04 (1.36-3.07), other ethnicities 1.93 (1.08-3.45), and mixed ethnicities 1.07 (0.40-2.86). Socioeconomic status (adjusted RR 1.93 (1.51-2.46) for the most deprived), obesity (adjusted RR 1.04 (1.02-1.05) per kg/m2) and comorbid hypertension, chronic obstructive pulmonary disease, asthma, and specific renal diseases were also associated with increased risk of COVID-19. Conclusions: COVID-19 rates in the UK are higher in BAME communities, those living in deprived areas, obese patients, and patients with previous comorbidity. Public health strategies are needed to reduce COVID-19 infections among the most susceptible groups.


Background
By the end of 22 April 2020, illness caused by SARS-CoV-2 infection, also known as COVID-19, had been diagnosed in 133,495 people and killed 18,100 people in the United Kingdom (UK). As the virus spreads nationally and internationally, data continue to arise on subpopulations most at risk of contracting this disease and developing its severest forms. Weekly data from the UK Office for National Statistics (ONS) [1] suggest that COVID-19 was responsible for 6,213 deaths in England and Wales in the week ending 10 April 2020, equivalent to 33.6% of all deaths observed. Although almost 40% of the casualties were patients aged 75 to 84 years old, there is increasing evidence of life-threatening forms of COVID-19 in younger age groups. Almost 5% of COVID-19-related ICU admissions in Italy have been under 40 years old, and another 36% have been 40 to 60 years old. Overall, 82% of these patients have been men. [2] Concerns have been raised recently that Black, Asian, and minority ethnic (BAME) communities in the UK might be at higher risk of COVID-19 mortality, including BAME healthcare professionals. [3] There is, however, a scarcity of data on whether this excess mortality is related to higher susceptibility to infection, higher risk of lethal forms of the disease, or poorer prognosis. Emerging data from seroprevalence studies [4] and US hospital electronic medical records [5] do not support these hypotheses, suggesting geographic heterogeneity.
Preliminary data also suggest a higher rate of infections in low-income households [6] and greater severity and rates of complications and death in older people, men, and people with specific comorbidities, such as hypertension. [7][8][9] Understanding the association between socioeconomic deprivation and the risk of COVID-19 infection is key to devising targeted, efficient interventions to limit its transmission in the community.
We obtained data from the UK Biobank, a large UK cohort, to unravel the associations between ethnicity, socioeconomic status, and the risk of COVID-19 infection.

Study design and setting
We used a prospective cohort study. All participants who were registered in UK Biobank [10] and living in February 2018 and who had not withdrawn permission to use their data by 7 February 2020 were included. Those residing in Scotland, Wales, or Northern Ireland were excluded as test data were only available for England.
The UK Biobank study recruited more than 500,000 participants from 37 to 73 years old through postal invitation to those who are registered with the UK National Health Service, living in England, Scotland, and Wales. Demographics, lifestyle, disease history, and physiological measurements were collected via questionnaires, physical measurements and interviews in baseline assessments . [10,11] Participants gave informed consent for data linkage to medical records. The participants tend to be healthier than the general UK population. [12] Participants were followed-up using COVID-19 test data from 16 March 2020 to 14 April 2020. Data was linked to COVID-19 tests (see Study Outcome) and hospital inpatient data from Hospital Episode Statistics (HES).

Study outcome
The main outcome was SARS-CoV-2 infection based on a PCR test. This information was retrieved from UK Biobank linkage to Public Health England COVID-19 test data.
We obtained data on who was tested, test results, and whether the test sample was taken in an inpatient setting. Patients were considered positive if one or more of the tests performed were positive for SARS-CoV-2. Participants were considered inpatients if one of their samples was marked as from an inpatient setting.

Exposures and measurements
The main exposure variable was self-reported ethnicity, collected during the participant's UK Biobank recruitment visit using a touch-screen questionnaire. This information was classified using Office of National Statistics groups into Asian/Asian British, Black/Black British, Chinese, mixed, White, and other groups. "Prefer not to answer" and "Don't know" responses were grouped together.
Socioeconomic status was based on the index of multiple deprivation (IMD) collected at recruitment into the UK Biobank. IMD England 2010 index, rank, and deciles were used to stratify participants into IMD quintiles.
Sex and year of birth were acquired from the National Health Service Central Register (NHSCR) at recruitment, but in some cases were updated by the participant. BMI was calculated by the UK Biobank using physical measurements collected at the recruitment visit. Alcohol drinking and smoking were self-reported during the same appointment.
Where information on ethnicity, IMD, sex, year of birth, BMI, alcohol use, or smoking were missing at baseline, information collected at the further 3 visits was used to maximise completeness.
We selected the comorbidities that are most likely to be risk factors for COVID-19 based on previous literature [13,14] and assessed them using pre-specified ICD-10 codes from linked HES data from the end of March 1992 to the end of March 2017. We included hypertensive disease (I10-I15), diabetes mellitus (E10-E14), ischaemic heart diseases (I20-I25), other forms of heart disease including hearth failure (I30-I52), chronic lower respiratory diseases including COPD and asthma (J40-J47), and renal failure (N17-N19).
This research was conducted using the UK Biobank Resource under Application Number 46228. Although the original application was unrelated to COVID-19 work, an exception was made to allow these linked data to be used for COVID-19 research without further applications, to maximise the speed of the proposed study [15].

Statistical analysis
We calculated the proportion or mean and standard deviation of each exposure variable for the population as a whole and stratified by test status (not tested, tested, and tested positive). We computed cumulative incidence of testing and positive tests for each ethnic background group and IMD quintile. We fitted a multivariable Poisson model to estimate the risk ratios of SARS-CoV-2 infection according to ethnicity, socioeconomic status, and the presence or absence of the included comorbidities. All models were further adjusted for age, sex, alcohol drinking, smoking, and body mass index (BMI). Patients with missing data in any regressor were excluded from the multivariable regression. We did not test for interactions or perform subgroup analyses due to limited statistical power.

Results
After excluding 69,773 people due to country of residence, 17,151 due to death, and 13 due to opt outs, we included 415,582 UK Biobank participants in our sample. Of these, 1,416 (0.34%) had been tested and 651 (0.16%) had tested positive for COVID-19 infection by 14 April 2020. Detailed baseline characteristics of all participants and stratified by COVID-19 testing and infection status are reported in Table 1.
There were also noticeable disparities in the incidence of COVID-19 infections according to ethnicity, again with the highest rates among Black/Black British participants (0.61% (0.46%-0.82%)) and the lowest rates among White participants (0.14% (0.13%-0.15%)). Again, similar patterns were observed when the analysis was restricted to inpatient diagnoses.
Participants with any missing information were excluded from further analysis, excluding 2,815 (0.68%). Multivariable analysis showed that socioeconomic deprivation and comorbidity were independently associated with an increased risk of COVID-19 infection (Figure 2).
Using the first IMD quintile (least deprived) as the reference group, the adjusted relative risk of infection increased monotonically with IMD quintile, reaching almost double the risk in the fifth, most deprived quintile (relative risk: 1.93 (95% CI: 1.08-3.45). All of the tested comorbidities except history of ischaemic heart disease were independently associated with an increased risk of COVID-19 infection (history of hypertension, non-ischaemic heart disease, chronic respiratory disease, and renal impairment).
High body weight was also independently associated with COVID-19 infection risk, with an adjusted relative risk of 1.04 (1.02-1.05) per kg/m 2 increase in BMI. We did not find an association with self-reported alcohol drinking or smoking status (data not shown).

Discussion
To our knowledge, this is the first study of the association between sociodemographics, ethnicity, and COVID-19 infection risk in the UK. In a cohort of over 415,000 participants, BAME communities appeared to be at higher risk of COVID-19 infection. is urgently needed.
Our study confirms previous findings on common comorbidities associated with COVID-19, including hypertension, diabetes, overweight/obesity, heart disease, chronic respiratory disease, and renal impairment. In a recent study of 5,700 participants, [13] the most common comorbidities observed among people admitted with COVID-19 to 12 hospitals in New York City were hypertension (56.6%), obesity (41.7%), and diabetes (33.8%). Similar findings have been reported elsewhere. [14] Our study has several limitations. We did not have data on region of residence.
According to ONS, [1] there are geographic differences in mortality attributable to COVID-19 in the UK, with the highest rates currently in London and the West Midlands.
Heterogeneous distribution of ethnic groups and socioeconomic status across the country could partially explain the observed effect.
Misclassification of COVID-19 infection due to differential testing across ethnic or socioeconomic strata may have affected the accuracy of our results. However, our data were collected when only severe cases were tested, and our findings were similar when restricted to inpatient testing, suggesting that even with these potential misclassifications, the observed associations remain clinically relevant. There was also potential for misclassification of comorbidity. However, we used linked electronic medical records to minimise recall bias and collect the most recent comorbidities.
The observational nature of this study makes the results susceptible to confounding and prevents us from inferring causality. We also lacked power to explore interactions or perform stratified analyses between the studied socioeconomic factors. There is a need to further explore the synergistic effects between these and with other health determinants on COVID-19 risk.
This study also has strengths, particularly surrounding the data used. We used a large sample with little missing data, links to hospital data, and self-reported information, which is considered a reliable source for ethnicity. [23] We also had very reliable information on COVID-19 diagnoses, based on linked official data on PCR tests. This dataset helped to create a clear picture of how ethnicity and deprivation affect COVID-19 risk. As the UK Biobank is not a representative sample of the UK population, [12] absolute risks should be interpreted carefully. However, the adjusted relative risks shown here provide an unambiguous signal of the disparities in COVID-19 incidence across the UK.
In conclusion, we found a strong association between ethnicity and the risk of COVID-19 infection in the UK, independent of other sociodemographic factors and comorbidity.
In addition, socioeconomic deprivation and common chronic conditions (hypertension, obesity, diabetes, chronic respiratory disease, heart failure, and renal failure) were independently related to an increased risk of COVID-19 disease. Public health strategies to control the current pandemic should take these factors into account to best mitigate the spread of disease and mortality related to COVID-19 infection.

Transparency declaration
The lead author affirm that the manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned have been explained.

Ethical approval
The National Health Service's National Research Ethics Service approved the collection and use of UK Biobank data.

Funding and study sponsors
The research was partially supported by the National Institute for Health Research