Abstract
Background Genetic and lifestyle factors are related to the risk of cancer, but it is unclear whether a healthy lifestyle can offset genetic risk. Our aim was to investigate this for 13 cancer types using data from the UK Biobank prospective cohort.
Methods In 2006-2010, participants aged 37-73 years were assessed and followed until 2015-2019. Analyses were restricted to those of European ancestries with no history of malignant cancer (n=195,822). Polygenic risk scores (PRSs) were computed for 13 cancer types and these cancers combined (‘overall cancer’), and a healthy lifestyle score was calculated from current recommendations. Relationships with cancer incidence were examined using Cox regression, adjusting for relevant confounders. Interactions between HLI and PRSs were assessed.
Results There were 15,240 incident cancers during the 1,926,987 person-years of follow-up (median follow-up= 10.2 years). After adjusting for confounders, an unhealthy lifestyle was associated with a higher risk of overall cancer [lowest vs highest tertile hazard ratio (95% confidence interval) = 1.32(1.26, 1.37)] and eight cancer types. The greatest increased risks were seen for cancers of the lung [3.5(2.96,4.15)], bladder [2.03 (1.57, 2.64)], and pancreas [1.98 (1.54,2.55)]. Positive additive interactions were observed, suggesting a healthy lifestyle may partially offset genetic risk of colorectal, breast, and pancreatic cancers, and may completely offset genetic risk of lung and bladder cancers.
Conclusions A healthy lifestyle is beneficial for most cancers and may offset genetic risk of some cancers. These findings have important implications for those genetically predisposed to these cancers and population strategies for cancer prevention.
Introduction
Cancer is the second leading cause of death worldwide (1). Genetic and lifestyle factors play an important role in the aetiology of cancer. While the heritability of cancer overall has been estimated to be 33% (2), individual genetic variants typically have little impact. However, when assessed collectively using a polygenic risk score, a greater number of these genetic variants can substantially increase the likelihood of developing some cancers. Indeed, a high genetic risk (top 20%) has been reported to account for up to 30% of cases, although this varies by cancer type (3, 4).
An estimated 30-50% of all cancer cases could be prevented through healthy lifestyle behaviours, such as eating a healthy diet, being physically active, maintaining a healthy body weight, and avoiding tobacco and alcohol (1). These lifestyle factors often coexist, creating issues in estimating independent relationships with disease outcomes. Thus, research has assessed lifestyle factors collectively in a healthy lifestyle score (5). While there is strong evidence that an overall healthy lifestyle reduces risk of colorectal and breast cancers, the evidence for other cancer types is less clear (6-8).
Recent evidence suggests that genetic risk of cancer may be offset by lifestyle factors (9). The strongest evidence has been obtained from studies on breast and colorectal cancer, which suggest a healthy lifestyle may reduce the risk regardless of genetic risk (10-12), and may even be of greater benefit for those at high genetic risk (13, 14). However, these relationships have not been explored for other cancer types. In this study, we investigate whether a healthy lifestyle can offset genetic risk and assess associations with 13 different types of cancer - bladder, breast, colorectal, kidney, lymphocytic leukaemia, lung, melanoma, non-Hodgkin lymphoma (NHL), oral cavity/pharyngeal, ovarian, pancreatic, prostate, and uterine cancer.
Methods
Study design and study population
This prospective cohort study utilizes data from the UK Biobank, a large population-based cohort of over 500,000 adults aged 37-73 years and living in the United Kingdom at the time of the initial assessment in 2006-2010 (15). Participants attended one of 22 assessment centres across England, Scotland and Wales and completed an assessment including a questionnaire, anthropometric measures, and collection of biological samples. UK Biobank was approved by the National Health Service North West Multi-centre Research Ethics Committee (11/NW/0382), the National Information Governance Board for Health and Social Care in England and Wales, and the Community Health Index Advisory Group in Scotland. All participants provided written informed consent.
To reduce the effects of population stratification, our analyses were restricted to participants of white European ancestries, and we included participants with no history of any malignant cancer at baseline and sufficient data available for the lifestyle factors and other covariates of interest (Figure 1).
Polygenic risk scores (PRSs)
We calculated PRSs for each participant for 13 cancer types based on single-nucleotide polymorphisms (SNPs) selected by Graff et al (3). Briefly, they identified a set of independent SNPs associated with cancer risk for each cancer type from previously published cancer genome-wide association studies. SNP selection was restricted to those that are common and available in the UK Biobank data. SNPs with the strongest associations with the broadest phenotype were preferentially selected, with an overall linkage disequilibrium r2 <0.3 to ensure independence (see Supplementary Table S1 for the number of SNPs used in the PRS calculations. See Graff et al for a full list of genetic variants (3)).
We calculated PRSs as the sum of the number of risk alleles multiplied by the log of the odds ratio (OR) for each SNP, as implemented in PLINK software (16) using the following formula: where n = the number of SNPs in the set of independent SNPs for that cancer type, xil is the number of risk alleles (0, 1 or 2) for the ith individual at the lth SNP, and ORl is the estimated OR for the lth SNP using a logistic regression.
Each PRS was z-standardized, with higher scores representing a greater genetic risk, and then categorized as low (lowest tertile), intermediate (middle tertile), and high (highest tertile) risk.
A PRS for overall cancer was also derived as the sum of the z-standardized PRSs for the 13 cancer types, each weighted according to the distribution of incident cases of these cancers reported by Cancer Research UK in 2017 (https://www.cancerresearchuk.org/health-professional/cancer-statistics/incidence#heading-Four).
Healthy lifestyle index
A healthy lifestyle index (HLI) was constructed based on the World Cancer Research Fund/American Institute of Cancer Research (WCRF/AICR) cancer prevention recommendations and standardized scoring system (full details of the calculation of the HLI are in Supplementary Table S2) (5). Baseline anthropometric and touchscreen questionnaire information was available for five of the eight recommendations included in the 2018 WCRF/AICR score (healthy weight; physical activity; wholegrain, fruit and vegetable intake; red and processed meat intake; alcohol consumption). Smoking status (categorized as never, former, or current smoking) was also included in the HLI, given its relevance as a modifiable lifestyle risk factor for multiple cancers. The HLI ranged from 0 to 6, with higher scores indicating a greater adherence to a healthy lifestyle. The HLI was then split into tertiles and categorized as unfavourable (HLI range 0-3), intermediate (HLI range 3.25-3.75), and favourable (HLI range 4-6) lifestyle.
Cancer outcome ascertainment
Malignant cancer diagnoses for each cancer type were identified using the ninth and tenth revisions of the International Classification of Diseases (ICD-9 and ICD-10) codes, phenotype, and tumour behaviour information obtained through linkage to national cancer registries in England, Wales, and Scotland. Incident cancer events were those diagnosed after baseline assessment. Cancer information was complete up to 31 July 2019 for England and Wales, and 31 October 2015 for Scotland. Hence, follow-up time was defined as being from the date of baseline assessment to the earliest of date of diagnosis, date of death (obtained via linkage to death registries) or 31 July 2019 for England and Wales residents and 31 October 2015 for Scotland residents.
Statistical analysis
Baseline characteristics were summarized for the total study population, those diagnosed with any of the 13 cancers examined (referred to as “overall cancer”), and individually for each cancer type. Characteristics were summarized as percentages for all variables for greater interpretability.
Cox proportional hazard regression models were used to investigate associations between PRS categories, HLI categories, and incidence of overall cancer and separately for each cancer. Incidence of cancers that are sex-specific were assessed in the relevant sex only (female cancers – ovarian, uterine, breast; male cancers – prostate). Breast cancer analyses were further restricted to post-menopausal women only (self-reported at baseline). All models were adjusted for age at baseline (continuous); sex (where relevant); assessment centre; socioeconomic status (Townsend Index - continuous); education (none listed, GCSE/CSE or equivalent, A levels or equivalent, college/university or other professional training); household income (<18,000, 18,000 to 30,999, 31,000 to 51,999, 52,000 to 100,000, >100,000); birth location (north south coordinate and east west coordinate – both in deciles), and population stratification measured by the first 40 principal components. Trend p-values were calculated by using the PRS and HLI categorical variables as pseudo-continuous variables in the models. Subsequent analyses adjusting for additional covariates specific to each cancer type were also performed (see Supplementary Table S3 for a list of the additional covariates and definitions). Additional analyses exploring the risk of cancer in the top 5% of PRS were conducted using Cox regression with the bottom tertile of PRS as the reference and adjusting for the same covariates.
For each cancer outcome, we investigated multiplicative interactions between lifestyle and genetic risk by using a likelihood ratio test to compare Cox models with and without an interaction term between PRS categories and HLI categories. Additive interactions were assessed by the relative excess risk due to interaction (RERI), which were calculated from the Cox models with the interaction term (17). Observed interactions were further explored visually using a forest plot produced from a multivariable Cox proportional hazards model with a combined genetic and lifestyle risk variable (9 categories with low genetic risk and favourable lifestyle as the reference). As smoking is a strong risk factor for lung cancer, we performed sensitivity analyses removing smoking from the HLI and including it as a confounder for this outcome. Proportionality of hazards assumptions were verified using Schoenfeld residuals and visual inspection of log-log and residuals plots. Two-sided p-values less than or equal to 0.05 were considered as evidence of an association. A p-value threshold of 0.0036 (calculated as 0.05/number of cancer outcomes) was applied to interaction models to account for multiple testing. All analyses were performed using Stata SE version 16 (StataCorp, College Station, TX). Results are presented in order of power determined from the number of cases for each cancer type.
Results
A summary of the baseline characteristics of the 195 822 participants included in the analysis, and for those diagnosed with any of the 13 types of cancer during the follow up period, are provided in Table 1. Over the 1 926 987 person-years of follow up (median [interquartile range] length of follow up = 10.2 [9.4-10.9] years), there was a total of 15 240 incident cases of the 13 cancer types of interest. The cancers with the highest number of incident cases were prostate cancer, colorectal cancer, and post-menopausal breast cancer. Generally, risk of cancer was higher in men, those of older age, and those who have lower levels of education and income. An unfavourable lifestyle and a high genetic predisposition were also associated with a higher incidence of most, but not all, cancers.
After adjusting for confounders, an unhealthy lifestyle was associated with a higher risk of overall cancer [lowest vs highest tertile of HLI hazard ratio (HR) 95% confidence interval (95%CI) = 1.32 (1.26, 1.37)], colorectal cancer [1.42 (1.28, 1.59)], post-menopausal breast cancer [1.42 (1.27, 1.59)], lung cancer [3.50 (2.96, 4.15)], kidney cancer [1.91 (1.51, 2.42)], uterine cancer [1.63 (1.31, 2.04)], pancreatic [1.98 (1.54, 2.55)], bladder cancer [2.03 (1.57, 2.64)], and oral cavity/pharyngeal cancer [1.69 (1.31, 2.18)] (Figure 2). There was no association between HLI and melanoma, NHL, ovarian cancer, and lymphocytic leukaemia. A weak association in the opposite direction was observed between HLI and prostate cancer risk [0.86 (0.79, 0.94)] (Figure 2). Further adjustment for other potential confounders specific to cancer type had little effect on the estimates presented (Supplementary Table S4).
A higher PRS was associated with higher risk of overall cancer and all cancer types assessed, except oral cavity/pharyngeal cancer (Figure 2). The cancers with the greatest increased risk in those with a high PRS compared to those with a low PRS were colorectal cancer [2.04 (1.82, 2.27)], melanoma [2.03 (1.75, 2.34), pancreatic cancer [2.26 (1.77, 2.89)], and lymphocytic leukaemia [2.45 (1.78, 3.36)]. Again, further adjustment for other cancer-specific potential confounders had little impact on the hazard ratios (Supplementary Table S4). Further exploration of genetic risk found those in the top 5% of PRS for prostate, colorectal, melanoma, and pancreatic cancers had an estimated 2.5-fold, 2.5-fold, 3-fold, and 4-fold increased risk, respectively (Table 2).
There were no multiplicative interactions observed between HLI and PRS for any of the cancer outcomes (p>0.05 for all outcomes, Figure 2). However, additive interactions were observed for colorectal, breast, lung, pancreatic, and bladder cancers (Supplementary Table S5). Forest plots of risk with a combined lifestyle/genetic risk variable were constructed to visually depict these interactions. These plots show a greater increased risk with a less favourable lifestyle in those with a higher genetic risk for these cancers, with a favourable lifestyle completely offsetting the increased risk from genetic factors for lung and bladder cancers (Figures 3).
We further investigated the association between lifestyle and lung cancer by removing smoking status from the HLI and including it as a confounder. In this analysis, an unhealthy lifestyle was associated with a higher risk of lung cancer [lowest vs highest tertile of HLI without smoking HR (95%CI) = 1.23 (1.06, 1.42), p=0.0055]. There was no multiplicative interaction between HLI (without smoking) and smoking status for lung cancer (p=0.393). The additive interaction between HLI and PRS was attenuated [unfavourable lifestyle/high genetic risk vs favourable lifestyle/low genetic risk RERI=0.44 (0.04, 0.85) p=0.033; ptrend=0.021].
Discussion
In this study, an unfavourable lifestyle and a high genetic risk were independently associated with an increased risk of several types of cancer. There were no multiplicative interactions between lifestyle and genetic risk, but additive interactions were observed. These findings suggest a healthy lifestyle may be of greater benefit in those with a high genetic susceptibility to colorectal, breast, and pancreatic cancers, and may completely offset genetic risk for lung and bladder cancers. Our findings are consistent with the limited comparable research for breast (10, 13) and colorectal cancers (11, 12, 14) and is novel for the other cancer types examined here. Communicating these results to groups with a high genetic risk of cancer may help alleviate any distress experienced due to awareness of their increased risk, and the appreciation of some control over the genetic risk is likely to be empowering and potentially supportive of positive behavioural changes.
A higher genetic risk of cancer was associated with an increased risk of overall cancer and 12 cancer types – prostate, colorectal, breast (post-menopause), lung, melanoma, NHL, kidney, uterine, pancreatic, bladder, ovarian, and lymphocytic leukaemia. The highest genetic risk was observed for pancreatic cancer, with those in the top 5% having a 4-fold higher risk compared to those in the bottom tertile of genetic risk. This increase is similar in magnitude to BRCA1 and BRCA2 gene variants and breast cancer risk, which are far less common (prevalence of 0.2-0.3%) and trigger more frequent cancer screening (18). Pancreatic cancer is one of the leading causes of cancer death that is typically diagnosed at a late stage when the 5-year survival rate is less than 10% (19). Screening for pancreatic cancer is recommended for those deemed high-risk (20), and our results suggest a PRS could be used as an additional tool to assist in the identification of those at high risk. We also found that those with a poor lifestyle had an increased risk of pancreatic cancer and this increase in risk was greater in those with a high genetic risk. Therefore, those with a high genetic susceptibility to pancreatic cancer may benefit more from a healthy lifestyle than their low genetic risk counterparts. This adds value to the identification of high-risk groups and highlights opportunities for prevention intervention programs.
An unhealthy lifestyle was associated with a higher risk of overall cancer and eight cancer types - colorectal, postmenopausal breast, lung, kidney, uterine, pancreatic, bladder, and oral cavity/pharyngeal cancers. These findings are consistent with previously published research for breast and colorectal cancers and add to the limited and inconclusive research on overall healthy lifestyle and other cancers (6, 7). In addition to pancreatic cancer, we also observed additive interactions for colorectal, breast, lung, and bladder cancers, suggesting the harmful effects of a poor lifestyle are higher with increased genetic risk and a healthy lifestyle may partially or fully offset genetic risk of these cancers. These findings reinforce the current public health message that living a healthy lifestyle including not smoking, avoiding alcohol, consuming a healthy diet, maintaining a healthy body weight, and engaging in regular physical activity, reduces cancer risk.
We found no relationship between a healthy lifestyle and risk of melanoma, NHL, ovarian cancer, and lymphocytic leukaemia. These results were not unexpected; none of the lifestyle factors included in the HLI have been conclusively linked with the risk of lymphocytic leukaemia, melanoma, or NHL, and only obesity is positively associated with ovarian cancer risk (8). For melanoma, although evidence indicates higher alcohol intake may increase risk, greater physical activity levels have also been associated with increased risk, which is likely related to higher sun exposure levels (21, 22).
We found a healthy lifestyle was associated with a higher risk of prostate cancer. This finding is inconsistent with the majority of previous research suggesting either a negative association or no relationship for combined or individual components of a healthy lifestyle (6, 23-28), and may be the result of healthy volunteer bias.
This large prospective cohort study has some limitations. The length of the follow up and age range of the study population has limited the number of incident cancer cases. Consequently, some statistical models may not have been adequately powered to observe a modifying effect of genetic risk on healthy lifestyle-cancer relationships. There is likely to be measurement errors in the components of the lifestyle score as they are almost all measured via self-report; however, this is likely to attenuate the findings to the null. We also cannot rule out the possibility of residual confounding, although we adjusted for multiple covariates. The measure of genetic risk was limited by the SNPs included in the PRS, which may not be exhaustive. Lastly, this study was conducted in adults of European ancestries, so the relevance of these findings to populations of other ethnicities is unclear.
In summary, our findings indicate individuals can reduce their cancer risk by adhering to the WCRF/AICR healthy lifestyle recommendations and, for some cancers, those who are genetically susceptible to cancer can partially or fully offset their increased risk by living healthily. Our findings also highlight the potential benefit of using a PRS to identify those with a high genetic risk of pancreatic cancer, and possibly other cancers, as PRSs are further refined. PRSs could be used in conjunction with other tools to increase the accuracy of identifying high-risk individuals who may then undergo regular screening and be targeted by prevention intervention strategies. This prospect requires further investigation using cohort studies conducted across various ethnic populations.
Data Availability
This research utilizes data from the UK Biobank resource (application number 20175) which is available directly from the UK Biobank upon submission of a data request proposal. See https://www.ukbiobank.ac.uk.
Declarations
Ethics approval
This study is a secondary analysis of UK Biobank data. The UK Biobank cohort study was approved by the National Health Service North West Multi-centre Research Ethics Committee (11/NW/0382), the National Information Governance Board for Health and Social Care in England and Wales, and the Community Health Index Advisory Group in Scotland. All participants provided written informed consent.
Author contributions
Data acquisition: Lee, Hypponen; Concept and design: Byrne, Boyle, Benyamin, Hypponen; Data analysis: Byrne, Boyle, Lee, Benyamin; Data management: Byrne, Ahmed, Benyamin; Drafting of the manuscript: Byrne; Obtaining funding: Boyle, Benyamin, Hypponen. All authors were involved in the interpretation of the results and the review and final approval of the manuscript. Drs Byrne and Boyle had full access to all the data and take full responsibility for the integrity of the data and the accuracy of the analysis.
Data availability
This research utilizes data from the UK Biobank resource (application number 20175) which is available directly from the UK Biobank upon submission of a data request proposal. See https://www.ukbiobank.ac.uk.
Supplementary material
Supplementary material is available online.
Funding
This work was supported by Tour de Cure [RSP-013-18/19]; and the Australian Government’s Medical Research Future Fund (MRFF) as part of the Rapid Applied Research Translation program. This project is part of the work being undertaken by Health Translation SA [MRF9100005].
Conflict of Interest
None declared