Abstract
Background Information on risk factors for COVID-19 is sub-optimal. We investigated demographic, lifestyle, socioeconomic, and clinical risk factors, and compared them to risk factors for pneumonia and influenza in UK Biobank.
Methods UK Biobank recruited 37–70 year olds in 2006–2010 from the general population. The outcome of confirmed COVID-19 infection (positive SARS-CoV-2 test) was linked to baseline UK Biobank data. Incident influenza and pneumonia were obtained from primary care data. Poisson regression was used to study the association of exposure variables with outcomes.
Findings Among 428,225 participants, 340 had confirmed COVID-19. After multivariable adjustment, modifiable risk factors were higher body mass index (RR 1.24 per SD increase), smoking (RR 1.38), slow walking pace as a proxy for physical fitness (RR 1.66) and use of blood pressure medications as a proxy for hypertension (RR 1.40). Non-modifiable risk factors included older age (RR 1.10 per 5 years), male sex (RR 1.64), black ethnicity (RR 1.86), socioeconomic deprivation (RR 1.26 per SD increase in Townsend Index), longstanding illness (RR 1.38) and high cystatin C (RR 1.24 per 1 SD increase). The risk factors overlapped with pneumonia somewhat; less so for influenza. The associations with modifiable risk factors were generally stronger for COVID-19, than pneumonia or influenza.
Interpretation These findings suggest that modification of lifestyle may help to reduce the risk of COVID-19 and could be a useful adjunct to other interventions, such as social distancing and shielding of high risk.
Funding British Heart Foundation, Medical Research Council, Chief Scientist Office.
Introduction
COVID-19, caused by SARS-CoV-2 infection, includes a spectrum of morbidity from asymptomatic infection 1 to severe pneumonia in patients presenting for medical care 2. The COVID-19 pandemic 3 has led to concerted research efforts to identify people at greatest risk of developing the infection and progressing to critical illness and dying. Predictors of disease severity include older age, smoking, diabetes, hypertension, kidney disease, COPD, and previous cardiovascular disease 4–7. In addition it is becoming clear that other risk factors might include obesity and low physical fitness 8,9 However, many studies investigate risk factors for disease progression to death or critical illness among hospitalised patients7 as opposed to healthy comparators.
Identifying the risk factors for COVID-19 is important in terms of identifying factors that can be modified to reduce risk, as well as identifying non-modifiable risk factors that can help identify high risk groups who require shielding and targeting for testing, and eventual vaccine and anti-viral therapies. Pneumonia is a life-threatening complication of COVID-19 infection 10. Established major risk factors for community-acquired pneumonia include many of the emerging risk factors for COVID-19 11. As COVID-19 is caused by SARS-CoV-2 viral infection, it may have risk factors for contraction of the disease that are common to other respiratory virus conditions such as Influenza.
UK Biobank is a large prospective, deeply-phenotyped, population-based cohort study carried out in the UK 12. Over the study period, testing for COVID-19 in England was conducted in A&E departments and in-hospital. These data were provided by Public Health England (PHE) and linked to UK Biobank baseline data. We aimed to establish modifiable and non-modifiable risk factors for confirmed COVID-19. We also aimed to compare these risk factors to risk factors for the incident pneumonia over a similar time span.
Methods
UK Biobank was conducted via 22 assessment centres across England, Scotland, and Wales between March 2006 and December 2010 and recruited 502,624 participants aged 37 to 73 years. The present study was restricted to participants living in England for whom COVID-19 test results were available. Death data were available to the end of January 2018 in England. To reduce bias, we excluded from the study all participants known to have died COVID-19 pandemic. Baseline biological measurements were recorded and touch-screen questionnaires were administered according to a standardised protocol. 12,13 UK Biobank received ethical approval from the North West Multi-Centre Research Ethics Committee (REC reference: 11/NW/03820). All participants gave written informed consent before enrolment in the study, which was conducted in accord with the principles of the Declaration of Helsinki. This study was performed under UK Biobank application number 7155.
Outcomes
Results of COVID-19 tests for UK Biobank participants were provided by PHE 7. Data provided by PHE included the specimen date, specimen type (e.g. upper respiratory tract), laboratory, origin (whether evidence from microbiological record that the participant was an inpatient or not) and result (positive or negative). Data were available for the period 16th March 2020 to 14th April 2020. Results were available for 2,724 tests conducted on 1,474 individuals. Confirmed COVID-19 infection (primary outcome) was defined as at least one positive result in the context of an in-hospital or accident and emergency (A&E) test. Any positive test result was used as a sensitivity analysis.
Pneumonia was defined based on ICD-10 codes J12-J18, and influenza based on J09-J11, converted into Read Codes using the UK Biobank’s look-up table. Incident pneumonia and influenza, occurring after 1st January 2016 (an arbitrary date taken to mimic the time lag from baseline exposure measurement to incident COVID-19 infection, while also obtaining sufficient case numbers) was obtained from a 41% sample of participants with available data from primary care. A sensitivity analysis was conducted using cases after 1st January 2015 and demonstrated consistent data. An analysis was also conducted exploring risk factors for all cases of pneumonia and influenza after baseline.
Exposures
Exposures were measured at the baseline assessment visit between 2006 and 2010. Ethnicity, smoking, alcohol consumption, physician-diagnosed prevalent conditions and medication use were self-reported. For the present analyses, ethnicity was coded as white, south Asian, black, or mixed/other. Smoking status was categorised into never versus former/current smoking. Systolic and diastolic blood pressures were measured at the baseline visit, preferentially using an automated measurement, but using manual measurement where this was not available, and average of available measures used. Area-level socioeconomic deprivation was assessed by the Townsend score (incorporating measures of unemployment, non-car ownership, non-home ownership and household overcrowding) corresponding to the participants’ home postcode. Higher scores on the Townsend score represent greater socioeconomic deprivation. Self-reported walking pace was rated by each participant as slow, steady/average, or brisk 14.
The definition of baseline diabetes included self-reported type 1 or type 2 diabetes, those with a primary or secondary hospital diagnoses relating to diabetes at baseline (ICD-10 codes E10-E14.9), and those who reported using diabetes medications. Baseline cardiovascular disease was defined as self-reported myocardial infarction, stroke, or transient ischaemic attack. Cancer, longstanding illness, and depression were self-reported on touchscreen questionnaire. Other previous health complaints including asthma, rheumatoid arthritis, CKD, systemic lupus erythematosus (SLE), sleep apnoea, COPD, pneumonia, bronchitis (including bronchitis, bronchiectasis, and emphysema), and other respiratory diseases (including interstitial lung disease, asbestosis, pulmonary fibrosis, alveolitis, respiratory failure, pleurisy, pneumothorax, other respiratory condition) which were derived from self-report at nurse interview. Some conditions did not have sufficient numbers of cases for multivariable analysis, but univariable results are presented for completeness. Statin (categorised to include other cholesterol lowering medications) and blood pressure medication use were also recorded from self-report, with blood pressure medication being used as a proxy for baseline diagnosed hypertension.
BMI is the ratio of the measured body mass in kg divided by height squared measured in metres. Height was measured using a Seca 202 height measure. Weight and whole-body fat mass and fat free mass were measured to the nearest 0.1 kg using the Tanita BC-418 MA body composition analyser. Socks and shoes were removed when height was measured Grip strength was measured using a Jamar J00105 hydraulic hand dynamometer and the mean was derived from the right and left hand values expressed in kilograms 15.
Lung function was assessed by spirometry using a Vitalograph Pneumotrac 6800 spirometer (Vitalograph, Buckingham, UK). Participants did not perform spirometry if they answered yes to unsure to the following: chest infection in the last month, history of detached retina, heart attack, surgery to eyes, chest or abdomen in last 3 months, history of collapsed lung, pregnancy, or currently on medication for tuberculosis. The aim was to record two acceptable blows from a maximum of three attempts. The spirometer software compared the acceptability of the first two blows and, if acceptable (defined as a ≤5% difference in FVC and FEV1), the third blow was not required. The mean observation was taken for both measures.
Blood collection sampling procedures for the study have been previously described and validated.16 Biochemical assays were performed at a dedicated central laboratory on around 480,000 samples. Further details of these measurements can be found in the UK Biobank Data Showcase and Protocol (http://www.ukbiobank.ac.uk). For the present study we included total cholesterol, HDL cholesterol, rheumatoid factor, cystatin C, glycated haemoglobin (HbA1C), C-reactive protein (CRP), differential white blood cell count, and red cell distribution width as exposures of interest.
Statistics
Mean and standard deviations were reported for continuous outcomes except for biomarkers, where median and interquartile ranges (IQR) were reported. Poisson regression with robust ‘sandwich’ standard errors were used to study the associations of exposure variables with confirmed COVID-19 and pneumonia. Poisson regressions were used because they provide risk ratio (RR) which is easier to interpret and robust error estimation ensures accurate inference 17. Three adjustment schemes were considered: model 0 - univariate (i.e. no adjustment), model 1 – adjusted for age, sex, ethnicity, and deprivation index, and model 2 – further adjusted for behavioural (smoking and alcohol drinking) and physical (adiposity, blood pressure, spirometry, and physical capability) factors that were found to be significant in model 1. To avoid multicollinearity, if there were similar factors that were significant in model 2, only the one with strongest association was chosen. This included, for example, choosing one from, BMI, BMI categories, body fat-free mass, body fat mass, and body fat percent, or one from FEV1, FVC, and FEV1/FVC. For continuous variables, the linearity of exposure-outcome associations were tested using penalised cubic splines in generalised additive model (GAM) 18. Nonlinearity was tested using likelihood ratio test comparing a model with the exposure fitted on a spline with a model assuming a linear exposure-outcome relationship. P-value for nonlinearity < 0.05 suggest evidence against the linearity assumption. Spline smoothness was chosen using generalised cross validation 19. Population attributable fractions (PAFs) were calculated to determine the relative contribution of each risk factor to the overall number of confirmed COVID-19 cases within UK Biobank. Another Poisson model which included all significant factors in model 2 was fitted to estimate the mutually adjusted RRs. These RRs were biased towards null because of over adjustment bias but were used to construct the PAFs to ensure the PAF estimates did not overlap or exceed 100%. In general, two-tailed P-values < 0.05 were considered statistically significant. Analyses were conducted in R Statistical Software version 3.5.3 with the package “mgcv”.
Results
Of 445,857 participants in England, 428,225 were alive during the available follow-up period. Complete data on covariates were available for 285,817 (66.7%) participants. Primary care data on incident pneumonia were available in 117,948 (41.3%) participants (Supplementary Fig 1). At 1st March 2020, the age range of eligible participants was 48-85 years, and time elapsed from baseline was median 10.97 years (IQR 10.36-11.55 years). Of these participants, 898 received at least one COVID-19 test, and 400 had at least one positive result, with 340 positive results conducted in hospital or A&E (primary outcome).
Univariable risk factors for incident COVID-19
Key univariable potentially modifiable risk factors for confirmed COVID-19 included current and former smoking (RR 1.62), higher BMI and body fat (RR 2.29 for obesity), treated hypertension (RR 2.07), higher systolic and diastolic blood pressure, and slow walking pace as a proxy for fitness (RR 2.40) (Table 1). Among non-modifiable risk factors were older age (particularly the over 65s at baseline, corresponding to over 75 in 2020), male sex (RR 1.56), black ethnicity (RR 2.74), socioeconomic deprivation (RR 1.37 per 1 SD increase in Townsend Index) (Table 1). COVID-19 was only weakly associated with low FEV1/FVC ratio (RR 0.90 per 1 SD increase). Among baseline comorbidities, general long standing illness (RR 1.88), baseline diabetes (RR 2.07), baseline CVD (RR 2.19), sleep apnoea (RR 3.91), statin use (RR 1.85), higher inflammatory markers (particularly white blood cell count; RR 1.20 per 1 SD increase), and higher cystatin C (RR 1.45 per 1 SD increase) were also associated with confirmed COVID-19 (Table 1). Risk factor associations were similar when all 400 test positive COVID-19 cases were considered, rather than only in-hospital positive tests (supplementary table 1)
Univariable risk factors for incident pneumonia
Of the 117,837 participants with primary care data, 259 had pneumonia recorded after 2016. Of the modifiable risk factors pneumonia was associated with BMI and slow walking pace, but not smoking, blood pressure or blood pressure medication use. Incident pneumonia was more common in participants who were over 55 years at baseline, women (RR 0.61 among men), and socioeconomically deprived participants (RR 1.37 per 1 SD increase in Townsend Index), (Table 1). Pneumonia was also more common in those with low grip strength, and lower FEV and FVC, and FEV/FVC ratio (Table 1). Among comorbidities, baseline diabetes, CVD and cancer were approximately twice as common in those who developed pneumonia (Table 1), and any longstanding illness was also strong associated with pneumonia. Among blood biomarkers, poorer renal function as measured by higher cystatin-C, higher HbA1C, higher CRP and white blood cell count (including neutrophils, monocytes, and lymphocytes) and higher red cell distribution width were all associated with pneumonia. Ethnicity was not associated with pneumonia.
When investigating risk factors for pneumonia over the full follow-up time from baseline, risk factors were generally similar although there was increased power due to a higher number of incident cases (Supplementary table 1). In this analysis smoking was weakly associated with pneumonia (RR 1.17) as was blood pressure medication use (RR 1.15). Similar findings were found when we studied pneumonia occurring after 2015 (Supplementary Table 2).
Univariable risk factors for incident influenza
Of the 117,837 participants with primary care data, 111 had influenza recorded after 2016. Those who developed influenza were generally more socioeconomically deprived, more likely to have bronchitis at baseline, and less likely to take statins. No other risk factors showed strong associations (Table 1). When all influenza cases occurring after baseline were considered, evidence of statistically significant associations became stronger due to increased power but cases were more common in slightly younger people and in women, and smoking, walking pace, and BMI was less strongly associated with influenza cases than for COVID-19, and blood pressure medication was not at all associated with influenza (Supplementary Table 1).
Multivariable models for COVID-19 and pneumonia
After multivariable adjustment for sociodemographic factors, the modifiable risk factors for confirmed COVID-19 included smoking (RR 1.44 (95% CI 1.15-1.79)), higher BMI (RR 1.33 per SD increase (95% CI 1.21-1.46)) and other measures of body fat, diastolic blood pressure, as well as blood pressure medication, FEV1 and FVC (but not FEV1/FVC ratio), and slow walking pace (RR 2.04 (95% CI 1.49-2.79)) (Table 2). Other risk factors were older age (RR 1.15 per 5 years (95% CI 1.08-1.24)), male sex (RR 1.54 (95% CI 1.24-1.91)), black ethnicity (RR 2.07 (95% CI 1.16, 3.71)), socioeconomic deprivation (RR 1.35 per SD increase in Townsend Index (95% CI 1.23-1.49)). Among comorbidities, risk factors included longstanding illness (RR 1.65 (95% CI 1.32-2.06)), diabetes, CVD and statin use. Among blood biomarkers, risk factors included lower total and HDL-cholesterol, and higher cystatin C (RR 1.36 per 1 SD increase (95% CI 1.23-1.49)), HbA1C (RR 1.20 per 1 SD increase (95% CI 1.11-1.30)), CRP and white blood cell count (Table 2).
After adding BMI, blood pressure, FEV1 and walking pace as covariates, modifiable risk factors for COVID-19 admission continued to include smoking (RR 1.38 (95% CI 1.11-1.73)), higher BMI (RR 1.24 (95% CI 1.12-1.38)), use of blood pressure lowering medication as a proxy for hypertension (RR 1.40 95% CI 1.08-1.82) and slow walking pace (RR 1.66 (95% CI 1.20-2.30)). Other risk factors included older age (RR 1.10 per 5 year increase (95% CI 1.02-1.36)), male sex (RR 1.64 (95% CI 1.25-2.14)), black ethnicity (RR 1.86 (95% CI 1.03, 3.36)), socioeconomic deprivation (RR 1.26 (95% CI 1.14-1.40)), longstanding illness (RR 1.38 (95% CI 1.09-1.74)), lower HDL-cholesterol (RR 0.82 (95% CI 1.72-0.95)) and higher cystatin C (RR 1. 24 (95% CI 1.12-1.37)) (Table 2).
Nonlinear associations of continuous variables with COVID-19 and pneumonia are shown in Figure 1 (and supplementary table 2). Age was associated with COVID-19 admission curvilinearly, where participants aged 40-60 years shared similar risk and participants aged over 60 years had exponentially increased risk with age. Associations were similar when adjusting for body fat percentage rather than BMI (Supplementary Table 3).
Directly contrasting model 2 for COVID-19 and pneumonia (Figure 2), the risk factors in common for both conditions were older age, higher BMI and slow walking pace, although the latter two associations less strongly associated with pneumonia than for COVID-19. Highlighting specific differences, smoking and treated blood pressure were only associated with COVID-19. Pneumonia was more common in women, and socioeconomic deprivation and cystatin-C were not associated with pneumonia. Low FEV1 was independently associated with pneumonia, but not with COVID-19. Pneumonia also showed an association with baseline cancer.
Population attributable risks
Factors that were significant in Table 2 were mutually adjusted to compare their population attributable fractions for COVID-19 and pneumonia (Table 3). Among potentially modifiable risk factors smoking accounted for 12.8% of COVID-19 cases that occurred within the UK Biobank population, treated BP with 5.0% of cases, slow walking pace with 2.8%, and BMI with 0.6% (total 21.2%). In contrast, none of these factors were large contributors to pneumonia cases within UK Biobank. (Table 3)
Discussion
In this population-based study, we found that confirmed COVID-19 infection was associated with a number of modifiable risk factors, a trend that less apparent for pneumonia and influenza. In particular, the associations of smoking, BMI (and body fat), hypertension, and physical fitness (as measured by slow walking pace) with COVID-19 are of note; even when such factors were measured a decade before infection they potentially accounted for one-fifth of UK Biobank confirmed COVID-19 cases. The other independent risk factors for COVID-19 infection included older age, male sex, black ethnicity, socioeconomic deprivation, longstanding illness, and reduced renal function as measured by cystatin C, the latter also notable given renal complication in severe COVID-19.
It is importance to recognise the competing risk for health of lockdown and social distancing. Social distancing reduces viral transmission, but also has consequences for lifestyle. Previous data suggests that physical fitness can be rapidly lost when activity levels decrease 20,21, and this will also result in an increase in BMI 22,23. Further, social distancing may increase loneliness, depression, and psychological stress in some people and consequently adversely affect eating habits 24 and other health behaviours. Anecdotal evidence from our clinics and media suggest many are struggling with overeating. This study strongly suggests public health guidance should focus on reducing the risk of severe complications of COVID-19 by advocating a healthy lifestyle during the ongoing pandemic, not just for general cardiovascular and metabolic health, but also to help to protect against COVID-19 infection, used alongside other public health interventions.
Understanding the actual causes of disease, rather than markers that simply correlate with exposures, is clearly a key issue to consider. The independent association of socioeconomic deprivation with COVID-19 after multiple adjustment may be explained by the accumulation of earlier life socioeconomic adversities which can result in less physiological reserve and more multimorbidity 24,25. It may also be linked to more overcrowding, reduced social distancing, and potential exposure to greater viral load. Asthma, diabetes and high blood pressure, which have been shown to be associated with a higher risk of severe COVID-19 outcomes 26 also showed some trends to be associated with COVID-19 in this dataset. While substantial focus of COVID-19 research has been its apparently more aggressive symptoms and disease progression in older people, it is important to recognise that people who are older have less cardiorespiratory reserve to cope with COVID-19 infection. Older age is also associated with more hypertension and diabetes, poorer lung function, and greater relative fat mass 24.
Previous reports have suggested that obesity or excess ectopic fat deposition may be a unifying risk factor for severe Covid-19 infection 27,28, reducing both protective cardiorespiratory reserve as well as potentiating the immune dysregulation that appears, at least in part, to mediate the progression to critical illness in a proportion of COVID-19 patients. Our analysis bears out these hypotheses. Once BMI or body fat was adjusted for, inflammatory markers were no longer associated with incident COVID-19. This suggests proinflammatory markers, arising from increased adipose deposition, are probably acting as a marker for body fat which seems to be an adverse risk factor for severe COVID-19 27–30. Furthermore, obesity enhances thrombosis 31, which is relevant given the association between severe COVID-19 and pro-thrombotic disseminated intravascular coagulation and high rates of venous thromboembolism 32,33, as well as the association with D-dimer seen in other reports 34. D-dimer was not measured in blood samples in UK Biobank.
Given that pneumonia is a critical clinical complication of COVID-19, it is important to understand how risk factors for COVID-19 related pneumonia differ from “classic” community acquired pneumonia 11. Our comparison between other common respiratory diseases and COVID-19 suggests that identification of at-risk groups based on our understanding of other respiratory diseases may be inadequate - a more refined approach to risk stratification based on COVID-19 specific risks is needed, and future data should focus on some of the exposures we identify to achieve this.
The strengths of our study include the large cohort size at an age relevant to more severe COVID-19 symptoms, and biochemistry assays performed in a single dedicated central laboratory. We were also able to extensively explore emerging and novel risk factors for COVID-19, while simultaneously comparing to risk factors for pneumonia identified to primary care providers. Limitations include that UK Biobank is not representative of the whole UK population,35 (our data focuses on England specifically) although this is generally not a concern in investigating risk associations36. Care should be taken in generalising the PAF estimates. These related to cases occurring within the UK Biobank population, but are not directly applicable to the general population where the prevalence of risk factors is different. However, they allowed a direct comparison between COVID-19 and pneumonia in the same cohort. Due to the under-representation of non-white ethnicities in UK Biobank, we have limited power to explore important interactions by ethnic group (although Black ethnicity was associated with the outcome), and we recognise this as an important risk factor. Ascertainment bias, including differential healthcare seeking, differential testing and differential prognosis may explain some differences in outcomes given poor coverage of testing in the UK. It is also likely that cases will generally be at the more severe end of the clinical spectrum by using hospitalised cases as the outcome, although use of admission to ITU would be additionally informative once sufficient case numbers accrue. Despite this, we still observe many similar risk factors to incident pneumonia, which is more likely to have close to complete case ascertainment in primary care records, although, as discussed, other risk factors seem more strongly linked to COVID-19 infections. Exposures were measured several years before the development of the outcomes, and misclassification of the exposures will likely bias our results to the null. However, this also serves to illustrate that the risk factors were generally present many years before development of the disease, and as is well known, risk factors tend to track with age. We have also been unable to fully exclude all deaths that occurred prior to the pandemic, due to lack of up-to-date linkage to mortality records.
In conclusion, these early data from UK Biobank suggest risk factors for confirmed COVID-19 infection differ in some important ways from risk factors for pneumonia, being more common in males than females, in lower SES, and with stronger associations with ethnicity, CV risk markers, prior smoking and adiposity. Such findings suggest possible merit in advocating improvements in lifestyle as an additional measure to reduce the risk of COVID-19 alongside existing public health measures such as social distancing and shielding of high risk groups. have implications for health advice targeted at the public to lessen risks during this pandemic
Data Availability
UK Biobank data can be requested by bona fide researchers for approved projects, including replication, through https://www.ukbiobank.ac.uk/
Contributors
PW, JPP and NS conceived the idea for the paper. FH conducted the analysis. All authors contributed to the interpretation of the findings. PW, FH, CCM and NS jointly wrote the first draft. All authors critically revised the paper for intellectual content and approved the final version of the manuscript.
Declarations of interest
JPP is a member of the UK Biobank Steering Committee. Apart from the funding acknowledged below, we declare no other competing interests.
Acknowledgements
The work in this study is supported by the British Heart Foundation Centre of Research Excellence Grant RE/18/6/34217. CLN acknowledges funding from a Medical Research Council Fellowship (MR/R024774/1). SVK acknowledge funding from the Medical Research Council (MC_UU_12017/13) and Scottish Government Chief Scientist Office (SPHSU13). SVK also acknowledges funding from NRS Senior Clinical Fellowship (SCAF/15/02).