Abstract
A greater understanding of factors associated with good health may help increase longevity and healthy life expectancy. Here we report associations between multiple health indicators and sociodemographic (age, sex, ethnicity, education, income and deprivation), psychosocial (loneliness and social isolation), lifestyle (smoking, alcohol intake, sleep, BMI, physical activity and stair climbing) and environmental (air pollution, noise and greenspace) factors, using data from 307,378 UK Biobank participants. Low income, being male, neighbourhood deprivation, loneliness, social isolation, short or long sleep duration, low or high BMI and smoking was associated with poor health. Walking, vigorous-intensity physical activity and more frequent alcohol intake was associated with good health. There was some evidence that airborne pollutants (PM2.5, PM10, and NO2) and noise (Lden) were associated with poor health, though findings were inconsistent in adjusted models. Our findings highlight the multifactorial nature of health, the importance of non-medical factors, such as loneliness, healthy lifestyle behaviours and weight management, and the need to examine efforts to improve health outcomes of individuals with low income.
Introduction
Substantial improvements in human health and significant increases in average life expectancy are amongst the main achievements of civilization over the past two centuries1. Life expectancy at birth in England and Wales has doubled from about 40 years in 1850 to more than 80 years in 20132. Future life expectancy is projected to increase across 35 industrialised nations, with a median increase in life expectancy at birth of approximately 3 years for women and 4 years for men in the United Kingdom between 2010–20303.
Many adults, however, now spend a substantial portion of their lives with late-life morbidities and decreased sensory, motor and cognitive functioning, which may lead to lower quality of life4,5. More than 80% of the population of England and Wales aged 85 and above report a disability and 50% require care or help with daily activities6. Nevertheless, some individuals experience very little functional decline in old age and rate their health as good or excellent7. A greater understanding of the factors associated with good health may help increase longevity and healthspan, i.e. the length of time that a person is healthy8.
The health effects of lifestyle factors such as smoking9, diet10, excessive alcohol intake11 and physical activity12 are well established. However, the number and type of potential confounders that have been adjusted for in previous analyses vary widely13, and few studies have jointly examined sociodemographic, psychosocial, lifestyle and environmental factors. In addition, research has primarily focused on predicting mortality or disease incidence instead of overall health status, despite its potential to provide insights into the factors associated with increased healthspan.
The UK Biobank study provides an unprecedented data resource to investigate determinants of health and ageing trajectories. One of its strengths is that data collection was not merely focused on established predictors of health and disease but also included relevant non-medical factors such as social isolation and loneliness, as well as road traffic noise and ambient air pollution14,15.
The aim of this study was to explore potential predictors of health status. More specifically, we examined how health status was associated with (i) sociodemographic characteristics and psychosocial factors, (ii) lifestyle factors, and (iii) environmental exposures, both independently and jointly. We also examined, as a secondary aim, whether there was evidence of similar associations between these factors and (i) long-standing illness and (ii) self-rated health.
Methods
Study population
The UK Biobank is a prospective study of >500,000 UK residents aged 37–73 at baseline, recruited between 2006–2010. Details of the study rationale and design have been reported elsewhere16. Briefly, individuals registered with the UK National Health Service (NHS) and living within a 25- mile (∼40 km) radius of one of 22 assessment centres were invited to participate (9,238,453 postal invitations sent). At the baseline assessment, participants completed questionnaires and were interviewed by nurses regarding sociodemographic characteristics, health behaviours and their medical history. A small subset of participants completed repeat measurements: first revisit between 2012–2013 (20,344 participants); second revisit as part of the UK Biobank Imaging Study between 2014–2019 (43,190 participants). Participants consented to use of their de-identified data. See Supplement-e1 for the UK Biobank data fields used.
Exposures
We considered individual-level sociodemographic characteristics including age at baseline assessment, sex, ethnicity (White, Asian, Black, Chinese, mixed-race or other), highest educational or professional qualification (four categories: (1) College/University degree; (2) A levels/AS levels or equivalent, NVQ/HND/HNC or equivalent, other professional qualifications; (3) O levels/GCSEs or equivalent, CSEs or equivalent; (4) none)17 and gross annual household income (<£18,000, £18,000– 30,999, £31,000–51,999, £52,000–100,000 or >£100 000). We also included the Index of Multiple Deprivation for England which is a small-area level measure derived from data on income deprivation, employment deprivation, health deprivation and disability, education skills and training deprivation, barriers to housing and services, living environment deprivation and crime18. Higher values on the index reflect greater deprivation.
Loneliness was assessed using two questions: “Do you often feel lonely?” (no=0/yes=1) and “How often are you able to confide in someone close to you?” (almost daily to about once a month=0/once every few months to never or almost never=1). Individuals who received a sum score of 2 were classified as lonely19. Social isolation was assessed using three questions: “Including yourself, how many people are living together in your household?” (living alone=1), “How often do you visit friends or family or have them visit you?” (less than once a month=1) and “Which of the following [leisure/social activities] do you attend once a week or more often?” (none of the above=1). Individuals who received a sum score of 2–3 were classified as socially isolated19.
Smoking status was assessed using two questions summarising current and past smoking behaviour. Individuals who responded “Yes, on most or all days” or “Only occasionally” to current tobacco smoking were coded as “current”. Individuals who responded “Smoked on most or all days” or “Smoked occasionally” to past tobacco smoking were coded as “former”. Individuals who responded “No” to current tobacco smoking and “Just tried once or twice” or “I have never smoked” to past tobacco smoking were coded as “never”. Alcohol intake frequency was assessed using one question: “About how often do you drink alcohol?”. Response options included “Daily or almost daily”, “Three or four times a week”, “Once or twice a week”, “One to three times a month”, “Special occasions only” and “Never”. Sleep duration was assessed using one question: “About how many hours sleep do you get in every 24 hours? (please include naps)”. Values below 1 hour or above 23 hours were rejected, and participants were asked to confirm values below 3 hours or above 12 hours. Body mass index (BMI) was calculated as weight divided by height squared (kg/m2). Weight measurements were obtained with a Tanita BC-418 MA body composition analyser. Standing height measurements were obtained using a Seca 202 height measure. Physical activity was assessed using the International Physical Activity Questionnaire (IPAQ) short form20. Specifically, we included data on the number of days per week spent walking, engaging in moderate-intensity physical activity (e.g. “carrying light loads, cycling at normal pace”) or engaging in vigorous-intensity physical activity (i.e. “activities that make you sweat or breathe hard such as fast cycling, aerobics, heavy lifting”) for ≥10 minutes continuously. Daily frequency of stair climbing at home was assessed using one question: “At home, during the last 4 weeks, about how many times a day do you climb a flight of stairs? (approx 10 steps)”. These data were only collected from individuals who indicated that they were able to walk. Response options included “None”, “1–5 times a day”, “6–10 times a day”, “11–15 times a day”, “16– 20 times a day” and “More than 20 times a day”.
Exposure to airborne pollutants was estimated using a Land Use Regression model developed as part of the European Study of Cohorts for Air Pollution Effects (ESCAPE) project21,22. We included annual average concentration of particulate matter with an aerodynamic diameter of <2.5 μm (PM2.5) and <10 μm (PM10), as well as nitrogen dioxide (NO2) modelled at participants’ residential addresses for the year 2010. Residential road traffic noise was modelled for the year 2009 using the Common Noise Assessment Methods (CNOSSOS-EU) algorithm23,24. We used Lden (day-evening-night noise level) which is an annual average 24-hour sound pressure level in decibels with a 10-decibel penalty added between 11pm and 7am. This penalty has previously been added in epidemiological analyses to account for annoyance/sleep disruption at night25. The percentage of the home location classed as greenspace, as a proportion of all land use types, was modelled using 2005 data from the Generalized Land Use Database for England (GLUD)26 for the 2001 Census Output Areas in England. Each residential address was allocated a circular distance buffer of 1000m, representing wider-area greenspace.
Health status
Data on 81 cancer and 443 non-cancer illnesses (past and current) were ascertained through touchscreen self-report questionnaire and confirmed during a verbal interview by a trained nurse. In order to provide a single health indicator (“health status”) based on a previously defined algorithm, we used a classification developed by the Reinsurance Group of America (RGA) in which an experienced underwriter classified each illness according to whether it was “likely acceptable for standard life insurance”27. Participants were thus classified as healthy or unhealthy based on their reported cancer and non-cancer illnesses (Supplement-e1; Supplementary file).
Two secondary health outcomes were assessed. Firstly, whether patients had a long-standing illness, disability or infirmity was assessed using the question “Do you have any long-standing illness, disability or infirmity?” to which individuals could respond “Yes” or “No”. Secondly, participants’ perceived health was assessed using the question “In general how would you rate your overall health?”. Response options included “Poor”, “Fair”, “Good” and “Excellent”. These health outcomes will be termed “long-standing illness” and “self-rated health”.
Exclusion criteria
Women who were pregnant at the time of assessment were excluded from the analysis based on the assumption that lifestyle patterns change during pregnancy28. Participants for whom their genetic sex, inferred from the relative intensity of biological markers on the Y and X chromosomes, and self- reported sex did not match were also excluded.
Statistical analyses
Participants with baseline data on all explanatory variables and health indicators were selected for cross-sectional analyses. Participants with missing data or who responded “do not know” or “prefer not to answer” were excluded.
Logistic regression analyses were used to estimate associations between sociodemographic characteristics, psychosocial factors, lifestyle factors and environmental exposures with health status and long-standing illness. Ordinal logistic regression analyses were used to estimate associations between these factors and self-rated health. Across these outcomes, poor health was the reference group. In defining the reference groups for categorical explanatory variables, we focused on interpretability of the results for middle aged and older UK residents.
We fitted three incremental models using the baseline UK Biobank data: Model 1–individual explanatory variables; Model 2–individual explanatory variables plus age and sex; Model 3–all explanatory variables. All models were fitted in the analytical sample without missing data to ensure that any differences between models were not due to inclusion of different participants. We calculated odds ratios and Bonferroni-adjusted (∼99.9%) confidence intervals. Adjusted p-values were calculated to account for multiple testing within each model (i.e., typically for 39 tests). Two methods were used: (1) Bonferroni and (2) Benjamini & Hochberg29, all two-tailed with α = .05. We report in-text the most conservative correction at which statistical significance was reached. If the unadjusted p- value was greater than .05, in which case neither Bonferroni nor Benjamini & Hochberg-adjusted p- values would reach criteria for statistical significance, we report the unadjusted p-value. Multicollinearity was assessed using generalised variance inflation factors30.
To compare the magnitudes of association between the explanatory variables and health indicators, we calculated standardised regression coefficients by rescaling all variables included in Model 3 to have a mean of zero and by dividing the coefficients of numeric variables with more than two values by twice their standard deviation31.
To assess whether any associations between explanatory variables and health indicators were modified by sex or age, we stratified analyses by sex and by age at baseline assessment (<65 years and ≥65 years) and assessed potential age- and sex-interactions by adding cross-product terms to Model 3. The choice of age strata was based on the current UK retirement age.
We used ordinal logistic regression analyses to model associations between the explanatory variables at baseline and self-rated health at the first and second revisit (t1 and t2, respectively). We applied the same modelling strategy, except that we additionally adjusted for the number of days between baseline and follow-up assessment.
As variable selection and classification might impact results, we conducted multiple additional analyses to examine related variables, categorised continuous variables according to health guidelines or previously determined cut-offs and tested the robustness of our findings to additional exclusion criteria. We repeated the main analysis (i) with sleep duration as categorical variable (<7, 7-8 [reference] and ≥9 hours of sleep/day)32; (ii) with BMI as categorical variable (<18.5, 18.5-24.9 [reference] and >24.9 kg/m2); (iii) with body fat percentage (estimated by electrical bio-impedance measurement) instead of BMI as measure of body composition; (iv) excluding individuals who reported that they stopped drinking alcohol because of “Illness or ill health” or “Doctor’s advice”; (v) sub-dividing those individuals who reported that they never drink alcohol into current and lifetime abstainers; (vi) with current tobacco smoking only (three categories: “Yes, on most or all days”, “Only occasionally” and “No”); (vii) with Metabolic Equivalent Task (MET) minutes per week for walking, moderate physical activity and vigorous physical activity33 instead of the number of days per week spent engaging in these activities for ≥10 minutes; (viii) reducing the greenspace percentage circular distance buffer from 1000m to 300m to examine nearby greenspace; (ix) with air pollution and noise exposure estimates dichotomised according to WHO recommendation thresholds34,35: PM2.5 ≤10 μg/m3, PM10 ≤20 μg/m3, NO2 ≤40 μg/m3 and Lden ≤53 decibels; (x) restricting analyses to participants assessed after 31 December 2008 (regarding noise estimates) and restricting analyses to participants assessed after 31 December 2009 (regarding air pollution estimates); (xi) restricting analyses to those individuals who had lived at their current address for at least 10 years; (xii) truncating continuous explanatory variables at the 1st and 99th %ile of the distribution.
Statistical analyses were conducted using R (version 3.5.0).
Results
Study population
Of the N = 502,521 UK Biobank participants, 5.32% (n = 26,757) had missing health data and 33.51% (n = 168,386) had missing data on explanatory variables or did not meet our inclusion criteria. Hence our analytical sample included n = 307,378 adults (Supplement-e2).
Descriptive statistics are presented in Supplement-e3–e6. There were no substantial differences between the full and analytical sample. Of the participants in the analytical sample, 8.18% (n = 25,142) reported at least one cancer and 73.30% (n = 225,312) at least one non-cancer illness (Supplement-e4). Approximately two thirds of participants (69.04%, n = 212,201) were classified as healthy, while 30.96% (n = 95,177) were unhealthy. A similar percentage of participants (30.5%, n = 93,757) reported having a long-standing illness (72.9% agreement with health status). Finally, 3.60% (n = 11,066) rated their health as poor, while 19.25% (n = 59,169), 59.44% (n = 182,699) and 17.71% (n = 54,444) rated their health as fair, good and excellent, respectively.
Regression analyses
We found weak correlations between most continuous explanatory variables and moderate to strong correlations between the environmental exposures (Supplement-e5). Generalised variance inflation factor (VIF) values for Model 3 were within the range of 1.02–2.62 for 19/21 explanatory variables. The VIFs for NO2 and PM2.5 were 6.08 and 4.27, respectively. Excluding NO2 from the model reduced the highest VIF to 2.42 (for PM2.5). Fitting Model 3 without NO2 or with each air pollutant separately had little impact on their associations with health status. As such, we report the results for Model 3 that included all explanatory variables. Unless indicated otherwise, results presented below correspond to multivariable-adjusted odds ratios and Bonferroni-adjusted (∼99.9%) confidence intervals from Model 3. A simplified overview of our findings is presented in Supplement-e7.
Increased age was associated with lower odds of favourable health status (OR = 0.953, 99.9% CI 0.951-0.955, pBonf.<0.001). For every 1-year increase in age, the log-odds of being healthy decreased by −0.048 (99.9% CI −0.050 to −0.046). Men had lower odds of being healthy (OR = 0.88, 99.9% CI 0.861-0.909, pBonf.<0.001). Individuals with high income had higher odds of being healthy (OR = 1.05, 99.9% CI 1.010-1.094, pBonf. = 0.003 [£52,000–100,000 vs £31,000–51,999]), while those with lower levels of income had lower odds of being healthy (e.g. OR = 0.74, 99.9% CI 0.715-0.776, pBonf.<0.001 [<£18,000 vs £31,000–51,999]). Increased neighbourhood deprivation was associated with lower odds of being healthy (OR = 0.995, 99.9% CI 0.994-0.996, pBonf.<0.001). Compared to Whites, individuals of Black, Chinese, Mixed-race, and “other” ethnic background had higher odds of being healthy in Model 1, but only Chinese ethnicity was associated with higher odds of being healthy across all models (Model 3: OR = 1.83, 99.9% CI 1.365-2.511, pBonf.<0.001). Compared to individuals without educational or professional qualification, participants with any qualification had higher odds of being healthy in Model 1 and after adjustment for age and sex (Model 2). However, there was only limited evidence of associations between highest qualification and health in Model 3 (Table-1).
Loneliness was associated with lower odds of favourable health status (OR = 0.81, 99.9% CI 0.770- 0.859, pBonf.<0.001). Socially isolated individuals also had lower odds of being healthy, but there was greater attenuation in the strength of association in Model 3 (OR = 0.95, 99.9% CI 0.907-0.995, pBonf. = 0.013) (Table-1).
Longer sleep duration was associated with lower odds of favourable health status (OR = 0.97, 99.9% CI 0.963-0.987, pBonf.<0.001). For every 1-hour increase in sleep duration, the log-odds of being healthy decreased by −0.03 (99.9% CI −0.038 to −0.013). Walking frequently and engaging in frequent vigorous physical activity was also associated with higher odds of being healthy (OR = 1.01, 99.9% CI 1.003-1.018, pBonf.<0.001 and OR = 1.03, 99.9% CI 1.024-1.040, pBonf.<0.001, respectively). For every one-day increase in ≥10 minutes of vigorous physical activity, the log-odds of favourable health status increased by 0.03 (99.9% CI 0.024-0.039). Although moderate physical activity was associated with higher odds of being healthy in Model 1–2, we did not find evidence of an association in Model 3 (OR = 1.00, 99.9% CI 0.996-1.010, punadj. = 0.13). Frequent daily stair climbing at home was associated with higher odds of being healthy (ranging from OR = 1.07, 99.9% CI 1.015-1.129, pBonf. = 0.002 to OR = 1.19, 99.9% CI 1.110-1.272, pBonf.<0.001 [1-5 times/day and >20 times/day vs none, respectively]). We found some evidence that frequent alcohol intake was associated with higher odds of being healthy (e.g. OR = 1.06, 99.9% CI 1.020-1.010, pBonf.<0.001 [3-4 vs 1-2 times/week]), although for daily/almost daily alcohol drinking only in Model 2 (OR = 1.06, 99.9% CI 1.023-1.104, pBonf.<0.001), while infrequent alcohol intake was associated with lower odds of favourable health status (ranging from OR = 0.91, 99.9% CI 0.871-0.957, pBonf.<0.001 to OR = 0.63, 99.9% CI 0.595- 0.664, pBonf.<0.001 [1-3 times/month and never vs 1-2 times/week, respectively]). Higher BMI was associated with lower odds of being healthy (OR = 0.968, 99.9% CI 0.966-0.971, pBonf.<0.001). For every one-unit increase in kg/m2, the log-odds of being healthy decreased by −0.03 (99.9% CI −0.035 to −0.029). Past and current tobacco smoking was associated with lower odds of being healthy (OR = 0.79, 99.9% CI 0.770-0.815, pBonf.<0.001 and OR = 0.75, 99.9% CI 0.718-0.787, pBonf.<0.001, respectively) (Table-2).
In our analyses of environmental exposures, higher PM2.5 concentration was associated with lower odds of favourable health status (OR = 0.97, 99.9% CI 0.941-0.991, pBonf.<0.001). PM10 was also associated with lower odds of being healthy in Model 1–2, but there was no evidence of an association between PM10 and health status in Model 3 (OR = 1.002, 99.9% CI 0.993-1.010, punadj. = 0.47). NO2 was associated with lower odds of being healthy in Model 1–2. However, in Model 3, there was limited evidence that NO2 was associated with higher odds of being healthy (OR = 1.004, 99.9% CI 0.999-1.008, pBH = 0.009). Ambient sound level was associated with lower odds of being healthy in Model 1–2, but there was no evidence of an association with health status in Model 3 (OR = 0.998, 99.9% CI 0.995-1.002, punadj. = 0.09). Finally, we did not find consistent evidence of an association between percentage greenspace within a 1000m circular distance buffer and health status (Table-3).
Results for long-standing illness and self-rated health are described in the Supplementary text and presented in Supplement-e8–e10. Model 3 results across health indicators are presented in Figure-1–3.
Sociodemographic characteristics and psychosocial factors associated with health indicators. Confidence interval plot (odds ratio ± Bonferroni-adjusted (∼99.9%) confidence intervals) for Model 3 (i.e. including all explanatory variables). GCSEs = general certificate of secondary education. *Indicates reference group for categorical explanatory variables. For categorical explanatory variables the odds ratios for self-rated health indicate the changes in odds of reporting better self-rated health associated with the explanatory variable group relative to the reference group. Odds ratios for continuous explanatory variables for self-rated health indicate proportional odds ratios for a 1-unit increase in the explanatory variable on level of self-rated health. Annual household income groups: very low (<£18 000), low (£18 000–30 999), middle (£31 000–51 999), high (£52 000–100 000) and very high (>£100 000). ‘GCSEs’ also includes O levels and certificate of secondary education (CSE). ‘A levels’ also includes national vocational qualification (NVQ), higher national diploma (HND), higher national certificate (HNC) and ‘other professional qualifications’.
Lifestyle factors associated with health indicators. Confidence interval plot (odds ratio ± Bonferroni-adjusted (∼99.9%) confidence intervals) for Model 3 (i.e. including all explanatory variables). BMI = body mass index. *Indicates reference group for categorical explanatory variables. For categorical explanatory variables the odds ratios for self-rated health indicate the changes in odds of reporting better self-rated health associated with the explanatory variable group relative to the reference group. Odds ratios for continuous explanatory variables for self-rated health indicate proportional odds ratios for a 1-unit increase in the explanatory variable on level of self-rated health.
Environmental exposures associated with health indicators. Confidence interval plot (odds ratio ± Bonferroni-adjusted (∼99.9%) confidence intervals) for Model 3 (i.e. including all explanatory variables). PM = particulate matter; NO2 = nitrogen dioxide; Lden = day-evening-night noise level. Odds ratios for self-rated health indicate proportional odds ratios for a 1-unit increase in the explanatory variable on level of self-rated health.
Magnitude of associations
Standardised regression coefficients for health status are presented in Figure-4. Most notably, the magnitude of association between BMI (β = −0.30, 99.9% CI −0.33 to −0.27) or very low income (β = - 0.29, 99.9% CI −0.34 to −0.25) and health status was comparable to that of current smoking (β = −0.29, 99.9% CI −0.33 to −0.24). There was also a substantial difference in magnitude of association with health status between the very low and low income groups (β = −0.29, 99.9% CI −0.34 to −0.25, and β = −0.08, 99.9% CI −0.12 to −0.05, respectively). The magnitude of association between loneliness and health status was substantially larger than that of social isolation (β = −0.21, 99.9% CI −0.26 to −0.15, and β = −0.05, 99.9% CI −0.10 to −0.01, respectively). Finally, associations between environmental exposures and health status were relatively small compared to those observed for sociodemographic, psychosocial and lifestyle factors (Supplement-e11).
Horizontal bar plot for health status. β estimates and Bonferroni-adjusted (∼99.9%) confidence intervals. Model 3 regression coefficients rescaled to have a mean equal to zero and, for numeric variables with more than two values, divided by two standard deviations.
Associations with long-standing illness and self-rated health compared to health status were stronger for household income, sex, neighbourhood deprivation, loneliness, social isolation, walking frequency, vigorous physical activity, BMI and stair climbing frequency (Supplement-e11–e12).
Stratified and interaction analyses
Descriptive statistics of the analytical sample stratified by sex and age group are presented in Supplement-e13–e14.
The odds of being healthy were lower for men than for women in the lowest income group (pBonf.(interaction)<0.001). The association between increased age and lower odds of being healthy was stronger in men (pBonf.(interaction)<0.001). Asian women had higher odds of being healthy (pBH(interaction) = 0.007). The association between neighbourhood deprivation and lower odds of being healthy was stronger in men (pBonf.(interaction)<0.001). Men who were lonely had lower odds of being healthy (pBonf.(interaction) = 0.008) and social isolation was associated with lower odds of being healthy only in men (pBH(interaction) = 0.006). Longer sleep duration was associated with lower odds of being healthy only in men (pBonf.(interaction)<0.001). Reporting frequent alcohol intake was associated with higher odds of being healthy only in men (pBH(interaction) = 0.023 for 3-4 times/week and pBH(interaction) = 0.039 for daily/almost daily). The association between higher BMI and lower odds of being healthy was stronger in men (pBonf.(interaction)<0.001). The association between former smoking and health status was stronger in men (pBonf.(interaction)<0.001), while the association between current smoking and health status was stronger in women (pBH(interaction) = 0.005). PM2.5 was associated with lower odds of being healthy only in men (pBH(ineraction) = 0.011). Finally, we found that NO2 was associated with higher odds of favourable health status only in women (pBH(interaction) = 0.024) (Supplement-e15–e16).
The association between very low income and lower odds of being healthy was stronger in participants below the age of 65 than in participants aged 65 and above (pBonf.(interaction)<0.001). Reporting an income of £52,000–100,000 was associated with higher odds of being healthy only in participants younger than 65 (pBH(interaction) = 0.016). Men aged 65 and above had lower odds of being healthy (pBonf.(interaction)<0.001). Loneliness and social isolation were associated with lower odds of being healthy only in individuals younger than 65 (pBH(interaction) = 0.024 and pBonf.(interaction) = 0.011, respectively). The association between walking frequency and higher odds of being healthy was stronger in individuals aged 65 and above (pBonf.(interaction)<0.001). Climbing stairs 1-5 times/day was associated with favourable health status only in individuals younger than 65 (pBonf.(interaction)<0.001). There was some evidence that the association between never drinking alcohol and poor health status was stronger in individuals younger than 65 (pBonf.(interaction) = 0.002). The association between higher BMI and lower odds of being healthy was stronger in individuals aged 65 and above (pBH(interaction)<0.001). Finally, the association between current smoking and poor health status was stronger in individuals younger than 65 (pBH(interaction) = 0.038) (Supplement-e17–e18).
Although there were some differences, stratified analyses of long-standing illness and self-rated health indicated a large degree of consistency with the results observed for health status (Supplement- e19–e26).
Longitudinal analyses
We repeated our analyses of self-rated health in participants with follow-up data collected between 2012–2013 (n = 16,058; mean follow-up = 4.26 years, SD = 0.87) and between 2014–2019 (n = 32,617; mean follow-up = 8.57 years, SD = 1.64). Descriptive statistics and full results are presented in Supplement-e27–e29. The findings were fully consistent with cross-sectional analyses for very low and high to very high levels of income, sex, loneliness, sleep duration, walking frequency, vigorous physical activity, infrequent alcohol intake, BMI and smoking status. Although many results were consistent across timepoints, we found some differences between cross-sectional and prospective analyses for low income, age, neighbourhood deprivation, ethnicity, highest qualification, social isolation, moderate physical activity, stair climbing frequency, regular alcohol intake and environmental exposures.
Additional analyses
Descriptive statistics are presented in Supplement-e30.
For sleep duration and BMI, we found evidence of non-linearity in the association with health status (Supplement-e31). Hence, we also examined these explanatory variables as categorical variables. Compared to individuals with optimal sleep duration (7-8 hours/day), those who slept <7 or ≥9 hours/day had lower odds of being healthy (OR = 0.90, 99.9% CI 0.868-0.923, pBonf.<0.001 and OR = 0.70, 99.9% CI 0.667-0.736, pBonf.<0.001, respectively). We also found that low (<18.5 kg/m2) and high (>24.9 kg/m2) BMI was associated with lower odds of being healthy, compared to a BMI of 18.5-24.9 kg/m2 (OR = 0.70, 99.9% CI 0.584-0.852, pBonf.<0.001 and OR = 0.85, 99.9% CI 0.827- 0.877, pBonf.<0.001, respectively). There was no evidence of substantial departure from a linear association with health status for all other numerical variables with more than two values (Supplement-e31). Repeating the main analysis with body fat percentage as measure of body composition led to similar conclusions as for BMI: higher fat percentage was associated with lower odds of being healthy (n = 307,202; OR = 0.978, 99.9% CI 0.975-0.980, pBonf.<0.001). Excluding individuals who reported that they had stopped drinking alcohol due to illness or their doctor’s advice (n = 2,790) attenuated the association between never drinking alcohol and health status (n = 304,588; OR = 0.74, 99.9% CI 0.701-0.790, pBonf.<0.001). When sub-dividing individuals who reported that they never drink alcohol into current and lifetime abstainers, the association with health status was stronger in current abstainers (n = 307,346; OR = 0.53, 99.9% CI 0.491-0.568, pBonf.<0.001 and OR = 0.75, 99.9% CI 0.698-0.810, pBonf.<0.001, respectively). Restricting analyses to current smoking, we found that regular smoking and smoking only occasionally was associated with lower odds of being healthy, although the magnitude of association was stronger for regular smoking (n = 307,372; OR = 0.81, 99.9% CI 0.766-0.848, pBonf.<0.001 and OR = 0.90, 99.9% CI 0.835-0.980, pBonf. = 0.002, respectively). Examining MET minutes instead of the number of days per week spent walking, engaging in moderate or vigorous physical activity led to similar results (n = 268,674; data not shown).
When we examined air pollution and noise exposure estimates dichotomised according to WHO recommendation thresholds, we found some evidence that being exposed to ≤10 μg/m3 PM2.5 on average annually was associated with higher odds of being healthy (OR = 1.03, 99.9% CI 0.997- 1.062, pBH = 0.005). There was no evidence of an association with health status for exposures to ≤20 μg/m3 PM10, ≤40 μg/m3 NO2 or ≤53 decibels Lden in Model 3. We also did not find evidence of a threshold in the association between PM10 or NO2 and health status when including quintiles of these exposures in Model 3 (data not shown). Restricting analyses to participants assessed after 2008 (regarding noise estimates; n = 182,342) or 2009 (regarding air pollution estimates; n = 60,521) or to those who had lived at their current address for at least 10 years (n = 209,239) did not lead to different conclusions regarding the associations between environmental exposures and health status. There was some evidence that percentage greenspace within a 300m circular distance buffer was associated with health status (n = 307,378; OR = 0.999, 99.9% CI 0.9985-1.0001, pBH = 0.009). Truncating continuous explanatory variables at the 1st and 99th %ile did not materially change our results (n = 278,065; data not shown).
Discussion
Increased age was strongly associated with unfavourable health status and reporting a long-standing illness. However, older participants generally rated their health positive, which is consistent with several smaller studies36,37, though not all38,39. Ageing might be the single most important factor underlying disease40, with an almost universally-accepted expectation of declining health as people get older. As attainable health states shift with age41, older participants might evaluate their health more favourable, despite higher rates of illness and disability.
Although women, on average, report more illnesses, disabilities, and limitations in daily life42-44, one of the most robust findings in human biology is that they live longer45. Our findings are consistent with results from the Newcastle 85+ cohort study in which women rated their health more favourably than men42. However, other studies reported that women rated their health less favourably than men46-48, or did not find evidence of sex differences in self-rated health49,50. Sex differences in health status could result from differences in the frequency of specific illnesses, sex-specific reporting patterns, or biological and social factors48. Discrepancies in findings between studies might reflect differences in age group or socio-cultural factors.
High income and low levels of neighbourhood deprivation were associated with better health, broadly consistent with previous studies51-53. Notably, we found only a small difference in the strength of association with health status and long-standing illness between the high-income groups. The difference between the low-income groups, however, was substantial, supporting previous findings which suggested a non-linear association between family income and mortality54. For self-rated health, we also found evidence of substantial differences between the high-income groups.
Our study provides limited evidence that education was associated with favourable health status once other factors had been accounted for, although higher levels of qualification were associated with better self-rated health, consistent with previous research38. A recent UK Biobank analysis showed that remaining longer in school causally reduced participants’ risk of diabetes and mortality55. A potential explanation for why we did not find a consistent pattern in the full model is that differences in health status result from educated individuals engaging in healthier lifestyle behaviours56 that we had accounted for, or they could be due to socioeconomic or genomic factors.
Social isolation and loneliness were associated with poor health. The strength of association was greater for loneliness, particularly in men and individuals below the age of 65. Social isolation and loneliness are not always correlated57,58 and represent different aspects of social relations (scarcity of contact with others and discrepancies between the need for, and the fulfilment of, social interaction, respectively). In a meta-analysis of 70 studies, social isolation, loneliness, and living alone were associated with a 26-32% increased mortality risk14. There was no evidence of differences in mortality between these measures, although the strength of association was greater in individuals below the age of 65. A recent UK Biobank analysis found that socially isolated and lonely individuals had an increased risk of death, but only social isolation predicted all-cause mortality in a joint model19. The discrepancy with our finding (loneliness was more strongly associated with poor health) might reflect differences in outcome measures (general health in the present study vs mortality in previous investigations).
Long sleep duration, higher BMI as well as past and current smoking was associated with poor health. Sleeping less than 7 hours/day was also associated with poor health, consistent with a meta-analysis that provided evidence of a U-shaped association between sleep duration and all-cause mortality32. A BMI outside the optimal range of 18.5-24.9 kg/m2 was also associated with poor health.
Physical activity is a key lifestyle factor recommended for primary and secondary prevention of chronic health conditions59, and is associated with lower mortality risk60. In this study, walking frequency, especially in individuals aged 65 and above, stair climbing, and engaging in vigorous physical activity was associated with good health. Moderate physical activity was associated with better self-rated health, especially in men. A study in middle-aged British men found evidence of an association between vigorous, but not moderate, physical activity and reduced mortality61. Reviews of the literature report mixed findings on the relative contributions of moderate and vigorous physical activity62,63, with some evidence suggesting stronger associations for vigorous activity64.
A more frequent drinking pattern was associated with good health. Alcohol drinking often occurs in a social context and might therefore constitute a proxy for social wellness, supported by the finding that non-drinkers tend to be characterised by poor psychosocial health and low socioeconomic status. Moderate drinkers also perform more sports than lifelong abstainers65. Drinking less frequent than 1-2 times/week was associated with poor health and could not be fully accounted for by excluding individuals who had discontinued alcohol intake for health reasons or because of their doctor’s advice. However, the association with health status was stronger for current abstainers, suggesting that some individuals are non-drinkers in later life due to illness66. Current abstainers who quit drinking for health reasons could exaggerate poor health outcomes associated with not drinking alcohol67. Those who never drink might also differ from current drinkers in other characteristics68.
Road traffic noise and ambient air pollution are environmental risk factors for health. While we found some evidence that higher levels of all airborne pollutants were associated with poor health, only PM2.5 was associated with unfavourable health status and long-standing illness after adjustment for other factors. Higher levels of NO2 were associated with less favourable self-rated health. A joint analysis of UK Biobank, HUNT, and EPIC-Oxford data did not find evidence of an association between NO2 and any disease outcome, although both PM2.5 and PM10 were associated with higher incident cardiovascular disease69. Adjustment for physical activity and neighbourhood deprivation in the present study could have contributed to differences in findings between studies. Few individuals in our study were exposed to high levels of PM10 and NO2, which might explain why we did not find strong associations between these pollutants and health. The only exception was PM2.5, for which 45% of our sample were exposed to levels above the WHO recommendation threshold of ≤10 μg/m3. Positive associations between PM2.5 and self-rated health and between NO2 and favourable health status might represent spurious associations or statistical artefacts; we are not aware of any mechanisms linking higher levels of pollution to good health. While higher levels of residential noise were associated with poor health in Model 1-2, there was no evidence of an association after adjustment for other factors. Our findings are consistent with a previous analysis that did not find evidence of an independent association between Lden and cardiovascular disease, ischemic heart disease or cerebrovascular disease69.
We found some evidence that greenspace was associated with better self-rated health, consistent with a recent meta-analysis70, although not after adjustment for other factors. There were also some inconsistencies in the prospective analyses and regarding health status and long-standing illness. Several previous studies examining associations between greenspace and health did not adjust for physical activity and most did not include other environmental exposures such as air pollution or noise70. These differences could explain why we did not find consistent associations in this study.
Strengths and limitations
The large sample size enabled high precision in the estimation of associations and it allowed us to explore a range of explanatory variables, sex and age-specific associations, interactions and additional analyses with further classification of explanatory variables. Many of the findings from previous studies that we replicated in our analyses were conducted in smaller populations and with one exposure at a time, often providing limited insight into the multifactorial nature of health. One of the contributions of this study is that it allows for systematic comparisons of a broad range of factors associated with health. We examined three health indicators with slightly different ascertainment. While our findings were fairly consistent across these outcomes, we also identified differences between objective and subjective health. Overall health is arguably what matters most to people, and exploring factors associated with health indicators that transcend traditional disease-boundaries could represent an effective strategy to increase longevity and improve healthy life expectancy. Previous studies have often used composite scores of healthy lifestyles13 in which individual behaviours were weighted equally, even though the strength of their respective associations with health might differ. In the present study, we estimated associations separately for each lifestyle factor.
Nevertheless, our study has limitations. Cross-sectional analyses prevent examination of temporal sequences. Reverse causality such as poor health and disability leading to unhealthy lifestyle, less favourable socioeconomic outcomes, fewer social interactions or living in more deprived and polluted areas needs to be considered, except for fixed characteristics such as sex and ethnicity. Associations reported here cannot be interpreted as causal effects, given that several variables included in Model 3 could lie on the causal pathway of other variables. Model 3 might also be over-adjusted. Hence, we have focussed on associations that are consistent across Model 1-3 in the main body of the text and highlighted inconsistencies. As many explanatory variables were self-reported, we cannot exclude the possibility that social desirability might have influenced responses. For example, previous research has shown that up to half of participants who claim to never have had an alcoholic drink reported alcohol intake during previous surveys71. For most participants, explanatory variables were measured at a single time point, which could increase the potential for measurement error and did not allow us to model changes across the life course. However, there is evidence that lifestyle is fairly stable in this age group, with few people newly adopting a healthy lifestyle in middle age72. There might be some misclassification in the reporting of medical illnesses. However, participants were asked to report illnesses that had been diagnosed by a doctor and these were confirmed during a nurse-led interview.
Generalisability
The overall response rate of the UK Biobank was low (5.5%) and compared with non-responders, participants were older, more likely to be female and lived in less deprived neighbourhoods. Participants were also less likely to be obese, to smoke, to never drink or to drink alcohol daily and they reported fewer health conditions compared with data from a nationally representative survey73. While the UK Biobank states that “valid assessment of exposure-disease relationships are nonetheless widely generalizable and do not require participants to be representative of the population at large”74, concordant with the finding that there is little evidence of considerable bias due to non-participation in epidemiological research75, the magnitude of exposure-disease associations may depend on prevalence of effect modifiers76. A recent empirical investigation comparing the UK Biobank with data from 18 prospective cohort studies with conventional response rates showed that the direction of risk factor associations were similar, although with differences in magnitude77.
Implications
Our findings, while partly confirmatory in nature, shed light on some of the behavioural, psychosocial and environmental pathways to health and are thus relevant to public health policies aimed at promoting health in later life. Public health and medicine could put greater focus on non-medical factors such as loneliness, further encourage healthy lifestyle behaviours and weight management, and examine efforts to improve the health outcomes of individuals with low income. Our findings support the view that promoting individual responsibility for one’s health (e.g. engaging in healthy lifestyle behaviours) and policy-level commitments (e.g. reducing long-term exposure to environmental pollutants78 or improving neighbourhoods) both need to be considered in population health. While associations between lifestyle and health might reflect a life-long commitment to healthy behaviours, it is not too late to newly adopt healthy behaviours later in life72. Prevention, in addition to drug discovery and disease treatment, should be the top priority for health policy.
Data Availability
The data used in the present study are available to all bona fide researchers for health-related research that is in the public interest, subject to an application process and approval criteria. Study materials are publicly available online at http://www.ukbiobank.ac.uk.
Authorship contributions
CML acquired the studentship funding. JM and CML conceived the idea of the study; JM acquired the data; JM carried out the statistical analysis; JM and CML interpreted the findings; JM wrote the manuscript. CJR and CML critically reviewed the manuscript and JM revised the manuscript for final submission. All authors read and approved the final manuscript. JM had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Competing interests
JM receives studentship funding from the Biotechnology and Biological Sciences Research Council (BBSRC) and Eli Lilly and Company Limited. CML is a member of the Scientific Advisory Board of Myriad Neuroscience. CJR does not declare any potential conflict of interest.
Ethics
Ethical approval for the UK Biobank study has been granted by the National Information Governance Board for Health and Social Care and the NHS North West Multicentre Research Ethics Committee (11/NW/0382). No project-specific ethical approval is needed. Data access permission has been granted under UK Biobank application 45514.
Data sharing statement
The data used in the present study are available to all bona fide researchers for health-related research that is in the public interest, subject to an application process and approval criteria. Study materials are publicly available online at http://www.ukbiobank.ac.uk.
Acknowledgments
JM acknowledges current studentship funding from the Biotechnology and Biological Sciences Research Council (BBSRC) (ref: 2050702) and Eli Lilly and Company Limited in support of this work. CML is part-funded by the National Institute for Health Research (NIHR) Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and Kings College London. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health and Social Care. This project made use of time on Rosalind HPC, funded by Guy’s & St Thomas’ Hospital NHS Trust Biomedical Research Centre (GSTT-BRC), South London & Maudsley NHS Trust Biomedical Research Centre (SLAM-BRC), and Faculty of Natural Mathematics & Science (NMS) at King’s College London.