Smoking Interaction with a Polygenic Risk Score for Reduced Lung Function

Importance: Risk to airflow limitation and Chronic Obstructive Pulmonary Disease (COPD) is influenced by combinations of cigarette smoking and genetic susceptibility, yet it remains unclear whether gene-by-smoking interactions contribute to quantitative measures of lung function. Objective: Determine whether smoking modifies the effect of a polygenic risk score's (PRS's) association with reduced lung function. Design: United Kingdom (UK) Biobank prospective cohort study. Setting: Population cohort. Participants: UK citizens of European ancestry aged 40-69 years, with genetic and spirometry data passing quality control metrics. Exposures: PRS, self-reported pack-years of smoking, ever- versus never-smoking status, and current- versus former-/never-smoking status. Main Outcomes and Measures: Forced expiratory volume in 1 second (FEV1)/forced vital capacity (FVC). We tested for interactions with models including the main effects of PRS, different smoking variables, and their cross-product term(s). We also compared the effects of pack-years of smoking on FEV1/FVC for those in the highest versus lowest decile of predicted genetic risk for low lung function. Results: We included 319,730 individuals (24,915 with moderate-to-severe COPD). The PRS and pack-years were significantly associated with lower FEV1/FVC, as was the interaction term ({beta} [interaction] = -0.0028 [95% CI: -0.0029, -0.0026]; all p < 0.0001). A stepwise increment in estimated effect sizes for these interaction terms was observed per 10 pack-years of smoking exposure (all p < 0.0001). There was evidence of significant interaction between PRS with ever/never smoking status ({beta} [interaction] = -0.0064 [95% CI: -0.0068, -0.0060]) and current/not-current smoking ({beta} [interaction] = -0.0091 [95% CI: -0.0097, -0.0084]). For any given level of pack-years of smoking exposure, FEV1/FVC was significantly lower for individuals in the tenth compared to the first decile of genetic risk (p < 0.0001). For every 20 pack-years of smoking, those in the top compared to the bottom decile of genetic risk showed nearly a twofold reduction in FEV1/FVC. Conclusions and Relevance: COPD is characterized by diminished lung function, and our analyses suggest there is substantial interaction between genome-wide PRS and smoking exposures. While smoking has negative effects on lung function across all genetic risk categories, effects of smoking are highest in those with higher predicted genetic risk.

Participants: UK citizens of European ancestry aged 40-69 years, with genetic and spirometry data passing quality control metrics.
Exposures: PRS, self-reported pack-years of smoking, ever-versus never-smoking status, and current-versus former-/never-smoking status.
Main Outcomes and Measures: Forced expiratory volume in 1 second (FEV 1 )/forced vital capacity (FVC). We tested for interactions with models including the main effects of PRS, different smoking variables, and their cross-product term(s). We also compared the effects of pack-years of smoking on FEV 1 /FVC for those in the highest versus lowest decile of predicted genetic risk for low lung function.

Results:
We included 319,730 individuals (24,915 with moderate-to-severe COPD). The PRS and pack-years were significantly associated with lower FEV 1 /FVC, as was the interaction term

Conclusions and Relevance: COPD is characterized by diminished lung function, and our
analyses suggest there is substantial interaction between genome-wide PRS and smoking . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) exposures. While smoking has negative effects on lung function across all genetic risk categories, effects of smoking are highest in those with higher predicted genetic risk.

Introduction
Chronic obstructive pulmonary disease (COPD) is characterized by airflow obstruction, traditionally defined by a low forced expiratory volume in 1 second (FEV 1 )/forced vital capacity (FVC), and cigarette smoking is the greatest environmental risk factor 1,2 . Only a minority of smokers develop COPD 3,4 , and genetic factors account for some of this variation in susceptibility, with ~40% of the variability in spirometric measures of pulmonary function attributed to genetic variation 5-7 . Therefore, it has long been thought that airflow obstruction may develop partially as the result of gene-by-smoking interactions.
Despite the important contribution of both smoking and genetic factors to lung function, compelling evidence for gene-by-smoking interactions has been limited. Genome-wide interaction studies have identified a handful of spirometric-and COPD-associated loci that appear to interact with smoking status, 8-14 suggesting at least a portion of the variability in spirometric measures of lung function may be attributable to gene-by-smoking interactions. A major challenge in identifying gene-by-smoking interactions on lung function and risk to COPD is that individual genetic variants tend to be of small effect size and account for a low degree of phenotypic variability in lung function, diminishing the power to detect gene-by-smoking interactions.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) Pooling individual GWAS variants into a single genetic risk score can account for a greater proportion of phenotypic variability [15][16][17][18][19][20] , and should improve power to detect interactions.
Genetic risk scores have been used to investigate gene-by-environment interactions in psychiatric 21 and cardiovascular diseases 22 . Aschard and colleagues 23 were unable to detect individual single nucleotide polymorphism (SNP)-by-smoking interactions for FEV 1 /FVC for 26 variants identified as significant in a genome-wide joint meta-analysis of SNP-by-smoking associations of pulmonary function 14 ; however, when the authors summed these variants to create a genetic risk score, they found evidence of interaction between the genetic risk score and ever smoking status 23 . By contrast, Shrine et al. performed the largest GWAS of lung function to date, developed a genetic risk score including estimated effects of 279 variants showing significant effects on lung function, and reported no evidence of interaction between this genetic risk score and ever smoking status 19 , though the authors did observe an interaction of the genetic risk score with ever smoking status on moderate-to-severe COPD. We constructed a polygenic risk score (PRS) based on GWASs of FEV 1 and FEV 1 /FVC 19 that explained more of the variability in lung function than seen with the 279-variant risk score used by Shrine et al. (~30% versus <10%) 20 , and here we further investigate gene-by-smoking interactions.
We hypothesized that multiple measures of smoking exposure would significantly modify the effect of this genome-wide PRS on FEV 1 /FVC (i.e. because it is associated with lower lung function) in the United Kingdom (UK) Biobank population-based cohort.

Methods
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted March 29, 2021. ;

Study population
We included participants from the UK Biobank, a cohort of over 500,000 individuals aged 40-69 years 24 . All participants provided written informed consent and study protocols were approved by local institutional review boards/research ethics committees. Participants were excluded if spirometry or genetic data did not meet quality control standards; further details on the impact of these inclusion and exclusion criteria are shown in Figure S1. Quality control (QC) of spirometric data has been previously described 18,19,24 . Briefly, to determine lung function, FEV 1 and FVC were derived from the spirometry volume-time series data at the time of study enrollment, as previously reported 19 .
Genotyping was performed as previously described 19 , using Axiom UK BiLEVE array and Axiom Biobank array (Affymetrix, Santa Clara, California, USA) and imputed to the Haplotype Reference Consortium version 1.1 panel (accepting imputation accuracy R 2 > 0.5). We dropped variants with minor allele frequency < 0.01 and those showing deviation from Hardy-Weinberg equilibrium (p < 1e-6). We used only subjects of European ancestry based on a combination of self-reported ethnicity and k-means clustering of principal components of genetic ancestry, as previously reported 19 .

Statistical analyses
All analyses were done in R version 4.0.3 (www.r-project.org). Normality of continuous variables was assessed by visual inspection of histograms. Results are reported as mean ± standard deviation or median [interquartile range], as appropriate. Differences in continuous . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

Overview of study design
The primary outcome was the FEV 1 /FVC ratio, as clinical COPD is characterized by airflow obstruction (FEV 1 /FVC < 0.7), and severity graded based on decrements in FEV 1 % predicted 1,2 .
We first assessed whether three measures of smoking exposure (see below) modified the effect of a PRS on quantitative measures of FEV 1 /FVC. We then considered the joint effects of smoking exposures and being in the highest versus lowest decile of the PRS (i.e. highest versus lowest categories of predicted genetic risk). We examined "norms of reaction" for the relationship between pack-years of smoking and FEV 1 /FVC for those in the tenth compared to the first decile of predicted genetic risk.

Smoking exposures
We examined three measures of cigarette smoking exposure: 1) pack-years of smoking, 2) everversus never-smoking status, and 3) current smoker versus former-/never-smoking status. All smoking information was obtained by self-report. Pack-years of smoking was examined as continuous and categorical (pack-year categories: included those who reported current smoking, and 'former smokers' included non-current smokers who smoked more than 100 or more cigarettes in their lifetime.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 29, 2021. ; https://doi.org/10.1101/2021.03.26.21254415 doi: medRxiv preprint Polygenic risk score for lung function A polygenic risk score (PRS) for lung function was calculated as previously described 20 . Briefly, this PRS was based on genome-wide association results for FEV 1 and FEV 1 /FVC in UK Biobank and SpiroMeta 19 , and was developed using a penalized regression framework accounting for linkage disequilibrium 25 . PRSs were calculated for FEV 1 and FEV 1 /FVC, and then summed into a composite PRS, which was scaled and centered. The PRS was oriented such that a higher PRS was associated with lower FEV 1 and FEV 1 /FVC.

Interaction analyses
We performed multivariable linear regressions of FEV 1 /FVC on main effects of the combined PRS, smoking exposure, and cross-product interaction terms. We included covariates age, age 2 , sex, height, genotyping array and the first 10 principal components of genetic ancestry in the linear regression model. Age was scaled and centered prior to squaring. We also performed stratified analyses amongst those in the lowest and highest deciles of the PRS, separately for never and ever smokers.
Investigation of gene-by-environment interactions has been considered to be deviation from either an additive or multiplicative model. Therefore, we additionally examined the joint effects of smoking and PRS to assess a departure of the observed joint effect from the expected effect under an additive model. We focused on comparing the 1 st and 10 th deciles of PRS as previously done 20 . We created a categorical variable with mutually exclusive strata formed by the cross classification of smoking and PRS (10 th vs. 1 st decile). The reference category was the group . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 29, 2021. ; https://doi.org/10.1101/2021.03.26.21254415 doi: medRxiv preprint with the lowest relative smoking exposure (e.g. pack-years ≤ 10) in the first PRS decile. We then constructed multivariable linear regression models to evaluate the effects of this categorical variable on FEV 1 /FVC. The expected effect for those in the highest decile with the highest smoking exposure was estimated under an additive model, and calculated by summing the estimated effect size for the lowest decile vs. the highest smoking exposure group and the highest decile vs. the lowest smoking exposure group.

Norms of Reaction
A "norm of reaction" describes the relationship between a phenotype and environmental exposure for a given genotype 27 . We assessed norms of reaction for pack-years of smoking and FEV 1 /FVC for those in the lowest and highest deciles of predicted genetic risk. We plotted packyears of smoking versus FEV 1 /FVC, stratifying by lowest and highest deciles of genetic risk. We then compared the slopes of these lines with an Analysis of Covariance (ANCOVA) using the rstatix R package 28 . For the purposes of clinical interpretability, we trained multivariable linear regression models to assess the effect of 20 pack-years of smoking on FEV 1 /FVC for those in the highest and lowest deciles of predicted genetic risk, adjusting for the covariates detailed as above.
As sensitivity analyses, we repeated these analyses in ever smokers and in a dataset excluding all related individuals; to select unrelated individuals, we removed at least one individual from each related pair with kinship coefficient > 0.0625, favoring the inclusion of COPD cases. We also transformed reported pack-years of smoking (ln, scaling and centering, and rank normalization) to ensure that the effects of interaction terms were not due to misspecification of the main effects of smoking.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 29, 2021. ;

Characteristics of study participants
Characteristics of study participants are shown in Table 1. We included 319,730 participants; 24,915 participants met criteria for moderate-to-severe COPD cases (Global Initiative for Chronic Lung Disease (GOLD) spirometry grades 2-4 1 ), 38,713 had preserved ratio with impaired spirometry (PRISm) 29 , and 256,102 met criteria for GOLD spirometry grades 0/1.

Interaction of a polygenic risk score with smoking
The PRS was weakly correlated with pack-years of smoking (r = 0.041, p < 0.0001; Figure S2).
The relationship between PRS and FEV 1 /FVC stratified by pack-years of smoking categories is illustrated in Figure 1. In multivariable analyses (Table 2) Considering pack-years of smoking as a continuous variable (Table S1), the cross-product interaction term was also associated with FEV 1 /FVC (β [interaction]=-0.0028 [95% CI: -0.0029, -0.0026], p < 0.0001). We also performed three transformations of pack-years of smoking, and the PRS*pack-years interaction term was significant in each instance (all p < 0.0001, Table S2).
The relationship between the PRS and FEV 1 /FVC stratified by ever-vs. never-and current-vs.
former-/never-smoking statuses are shown in Figures S3A and B, respectively. Ever-smoking and the PRS*ever-smoking status interaction term were significantly associated with FEV 1 /FVC (both p < 0.0001 , Table S3). Similarly, current smoking status and the PRS*current-smoking . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
In stratified analyses, we observed similar results between PRS and smoking exposures (Table   S5, Figure S4). Additionally, ever smoking status and the PRS*ever-smoking status interaction term were significantly associated with FEV 1 (Table S6), but a non-significant difference for ever smokers in the highest genetic risk decile (Table S7).

Norms of reaction for highest versus lowest predicted genetic risk deciles
In Figure 3, we show different norms of reaction for the effects of pack-years of smoking on

Discussion
In this study of over 300,000 UK Biobank participants, we found three measures of smoking exposure modified the effect of a polygenic risk score (PRS) on the quantitative measure of lung function (FEV 1 /FVC). As expected, smoking was detrimental to lung function across all categories of predicted genetic risk. For any given level of pack-years of smoking exposure, however, those at highest genetic risk showed lower FEV 1 /FVC than those with the lowest predicted genetic risk. The effects of heavy smoking and being in the highest decile of predicted genetic risk were greater than would be expected based on the additive effects of both risk factors. These results support the idea that diminished pulmonary function (a measure of airflow obstruction) are, at least partially, due to gene-by-smoking interactions, and those in higher genetic risk categories are more susceptible to the deleterious effects of smoking.
Compared to previous studies, our study included more participants, leveraged a more powerful measure of genetic predisposition for low lung function (i.e. PRS), examined three different measures of smoking exposure (pack-years, ever smoking, current smoking), and examined 'norms of reactions' for those in the highest compared to the lowest deciles of predicted genetic risk. Our findings are consistent with Aschard et al. 23 who reported an interaction between ever-. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. By contrast, Shrine et al. 19 constructed a genetic risk score from 279 variants shown to influence lung function, but did not observe any evidence of interaction with ever-smoking status on Smoking was detrimental even to those with low predicted genetic risk, and the effects were greater for those with high predicted genetic risk. For any given level of pack-years of smoking, . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 29, 2021. ; https://doi.org/10.1101/2021.03.26.21254415 doi: medRxiv preprint those in the highest decile had lower FEV 1 /FVC compared to those in the lowest decile of predicted genetic risk. These findings are in contrast to observations in cardiovascular disease, where the association between smoking and coronary heart disease was greater for those in the lowest compared to the highest tertile of a PRS 22 . This difference may reflect that many individuals can develop coronary disease in the absence of cigarette smoking, and that smoking is a greater risk factor for those with low polygenic risk for coronary disease. Meanwhile, airflow obstruction primarily occurs in the setting of cigarette smoking exposure. Furthermore, those with low predicted genetic risk and high smoking exposure had similar risk for low FEV 1 /FVC as those with high genetic risk and low smoking exposure. Taken together, these results emphasize that abstaining from smoking is crucial to preventing obstructive lung disease regardless of an individual's predicted genetic risk, and that those in the highest risk groups might benefit from intensive smoking cessation measures with respect to the phenotypes examined in this study.
Our results suggest that the PRS includes variants that represent biological pathways by which smoking exerts deleterious effects. Some of these variants may act to confer resilience 34,35 or susceptibility to the effects of cigarette smoke. Further investigation into the role of specific variants in susceptibility to cigarette smoke may be performed by inclusion of other "Omics" data types and leveraging network analytic techniques could help elucidate mechanisms of resilience and susceptibility. For example, the effect of occupational exposures was modified by rs9931086 in SLC38A8 on FEV 1 , and network analyses suggested inflammatory processes involving CTLA-4, HDAC, and PPAR-alpha, may provide mechanistic links for the observed interaction 36 ; however, this was a small study that needs replication.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 29, 2021. ; Strengths of this study include use of a large volunteer cohort, utilizing the most powerful measure for genetic risk for low lung function available to date (i.e. a genome-wide PRS), and comparing individuals at extremes of predicted genetic risk. Limitations, inherent to study design, include that the UK Biobank is a single cohort observed in cross-section. Examining the effects of gene-by-smoking interactions on incident COPD should be pursued. We were not able to model the time-varying effects of smoking exposure. The PRS was partially developed using samples from UK Biobank, leading to overfitting of the PRS with respect to spirometric measures; while this issue should not affect interaction assessments, these results should ideally be replicated in future studies. However, the strong effect sizes and robustness to stratified and transformed analyses does lend confidence to our results. We included European-ancestry participants only because the PRS was derived solely from Europeans. Identification of causal variants and genetic prediction in single ancestry populations demonstrate limited portability to multi-ancestry populations 37,38 . The 279 lung function variants from Shrine et al. 19 was curated to ensure variants for smoking behavior were excluded, but the PRS used in the current study included ~2.5 million variants and was not similarly curated. Including variants that are causal for smoking behavior could bias the interaction term 39 ; however, there was a very weak correlation between this PRS and smoking exposure in UK Biobank, and previously no correlation with smoking in case-control cohorts was observed 20 , suggesting that the PRS used in the current study largely reflects the genetics of lung function. Finally, we emphasize that smoking cessation is the main preventive intervention to airflow obstruction regardless of genetic susceptibility. Our interaction findings further support a more intensive smoking cessation is required to those with high genetic susceptibility to diminished lung function. The effectiveness of targeted genetically-informed smoking cessation interventions is unclear, though there is . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 29, 2021. ; evidence that knowledge of genetic risk for AATD can increase smoking cessation 40 . Clinical utility of the PRS will depend on dissecting biological mechanisms of susceptibility to the harmful effects of smoking.
In conclusion, diminished FEV 1 /FVC and airflow obstruction, which are characteristic of COPD, may be partially attributable to gene-by-smoking interactions. As expected, smoking was harmful across all genetic risk groups, but worse for those in the highest decile of predicted genetic risk. Large-scale replication and further investigations into mechanisms of interaction are needed. change? Changes in smoking behavior following testing for alpha-1 antitrypsin deficiency.      . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)       . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.  . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.  . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted March 29, 2021. ; https://doi.org/10.1101/2021.03.26.21254415 doi: medRxiv preprint