Multi-omic association study implicates PPP1R13B in DNA methylation-mediated genotype and smoking exposure effects on decreased lung function in urban children

Impaired lung function in early life is associated with the subsequent development of chronic respiratory disease. Most genetic associations with lung function have been identified in adults of European descent and therefore may not represent those most relevant to pediatric populations and populations of different ancestries. In this study, we performed genome-wide association analyses of lung function in a multiethnic cohort of children (n=1035) living in low-income urban neighborhoods. We identified one novel locus at the TDRD9 gene in chromosome 14q32.33 associated with percent predicted forced expiratory volume in one second (FEV1) (p=2.4x10-9; {beta}z= - 0.31, 95% CI= -0.41- -0.21). Mendelian randomization and mediation analyses revealed that this genetic effect on FEV1 was partially mediated by DNA methylation levels at this locus in airway epithelial cells, which were also associated with environmental tobacco smoke exposure (p=0.015). Promoter-enhancer interactions in airway epithelial cells revealed chromatin interaction loops between FEV1-associated variants in TDRD9 and the promoter region of the PPP1R13B gene, a stimulator of p53-mediated apoptosis. Expression of PPP1R13B in airway epithelial cells was significantly associated the FEV1 risk alleles (p=1.26x10-5; {beta}=0.12, 95% CI=0.06-017). These combined results highlight a potential novel mechanism for reduced lung function in urban youth through increased airway epithelial apoptosis resulting from both genetics and smoking exposure.


INTRODUCTION
Reduced lung function is a hallmark of asthma and chronic obstructive pulmonary disease (COPD).
Lung function measures, such as forced expiratory volume in one second (FEV1) and forced vital capacity (FVC), are strong predictors of future all-cause mortality [1][2][3][4][5][6] . Impairment of lung function often begins in early life 7-10 , with lower lung function in infancy being a risk factor for the development of asthma in childhood 11 and COPD in late adulthood 12 .
Genetic factors contribute to differences in lung function among individuals, with heritability estimates ranging from 0.50 for FEV1 to 0. 66  Therefore, genetic risk factors discovered to date may not reflect those most relevant to high-risk populations, which can further exacerbate health disparities 53,54 . Identifying genetic variants and epigenetic variation associated with lung function in high-risk, multiethnic, pediatric populations may provide more direct insights into the early development of impaired lung function.
In this study, we analyzed measures of lung function from the Asthma Phenotypes in the Inner City (APIC) 55,56 and Urban Environment and Childhood Asthma (URECA) cohorts 57 , which consist of children living in low-income neighborhoods in 10 U.S. cities. We performed wholegenome sequencing (WGS) on 1,035 participants from APIC and URECA (ages 5-17 years; 67% non-Hispanic Black, 25% Hispanic; 66% with doctor-diagnosed asthma) and performed a GWAS with FEV1 and the FEV1/FVC ratio. We then performed expression quantitative trait locus (eQTL) and methylation quantitative trait locus (meQTL) mapping in airway epithelial cells and peripheral blood mononuclear cells (PBMCs) from a subset of the URECA children. We further tested for  . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
We then analyzed cg03306306 methylation in PBMCs collected at age 7 (n=169) 65 from URECA children to evaluate whether the genotype and lung function associations observed in NECs were shared with blood cells. In PBMCs, we observed no correlation between the rs10220464 risk allele and cg03306306 methylation (Fig 4c), nor was there an association between cg03306306 methylation and FEV1 (Fig 4d). These results indicate that cg03306306 methylation dynamics in the airway epithelium are not present in peripheral blood cells.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) Smoking exposure is associated with DNA methylation at the TDRD9 locus DNA methylation at the TDRD9 locus had previously been associated with maternal smoking during pregnancy 66,67 . Therefore, we tested for associations between smoking exposure (Fig S5) and DNA methylation at this locus in the URECA children. Methylation at cg03306306 in NECs was significantly associated with nicotine metabolite (cotinine) levels in urine collected at ages 7-10 years (p=0.015; b=0.03, 95% CI=0.01-0.05; Fig 5). Methylation at cg03306306 in PBMCs from age 7 was not associated with urine cotinine levels. To determine if there was an interaction effect between genotype and smoking exposure on DNA methylation and/or lung function, we repeated the cotinine association tests in URECA with the addition of an interaction term to assess if the genotype effect differed between individuals with low and high exposures to smoking. There were no significant genotype-by-smoking exposure interaction effects on methylation levels in NECs or PBMCs in URECA, nor were there any significant methylation-by-smoking effects on FEV1 (Fig S6). There was modest evidence for a genotype-by-smoking exposure interaction effect on FEV1 in the combined APIC and URECA sample, but this did not reach statistical significance (p=0.06, Fig S7). . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

Genetic effects on lung function are mediated by DNA methylation
To determine if DNA methylation at the TDRD9 locus had a causal effect on lung function, we performed a Mendelian randomization (MR) analysis using two-stage least squares (2SLS) regression. In the first stage, cg03306306 methylation levels in NECs were regressed on an instrument composed of four meQTLs for cg03306306 (rs11160777, rs137961671, rs7143936, rs11160776; Methods). In the second stage, FEV1 was regressed on the predicted DNA methylation values generated from the first stage regression, thereby yielding a causal effect estimate of cg03306306 methylation on FEV1. Urine cotinine levels were included as a covariate in both stages.
The causal effect of cg03306306 methylation on FEV1 was statistically significant (p=0.020). We further performed a bootstrapped mediation analysis to test whether the rs10220464 risk allele effect on FEV1 was mediated by DNA methylation. The indirect effect of rs10220464 on FEV1 via cg03306306 methylation was significant (b=-0.61, 95% CI=-1.58--0.04). These results indicate that the effect of the FEV1-associated genotype at the TDRD9 locus is partially mediated through its impact on nearby DNA methylation levels.

Gene expression and promoter-enhancer interactions implicate PPP1R13B
Trait-associated variants and DNA methylation often affect the transcriptome by influencing the expression of one or more neighboring genes 68,69 . Identifying these correlations can help infer causal mechanisms 70 . Therefore, we next explored the relationship between the genotype for the lead FEV1 variant rs10220464 and the expression of genes within 1 Mb in NECs and PBMCs from URECA children. Notably, the rs10220464 genotype was not associated with TDRD9 expression levels in these cells (NECs: p=0.60, b= 0.116; PBMCs: p=0.91, b= 0.014). Of the 27 genes that were evaluated (Table S4), rs10220464 was significantly associated with the expression of only one gene, PPP1R13B (Protein Phosphatase 1 Regulatory Subunit 13B; FDR q=2.77.x10 -4 ; p=1.26x10 -5 ; b=0.12, 95% CI=0.06-017 ; Fig 6a), in NECs. PPP1R13B expression levels were also the most strongly associated of the 27 genes with methylation at cg03306306 in NECs (p=0.018; b=0.10, 95% CI=0.02-018; Fig 6b). PPP1R13B expression in NECs, however, was not associated with FEV1 or smoking exposure (Fig S8).
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 27, 2022. ; The transcription start site of PPP1R13B resides 87 kb from rs10220464 and 152 kb from cg03306306, suggesting long-range interactions between the FEV1-associated genotype and the promoter of PPP1R13B. To determine whether any of the FEV1-associated GWAS variants at the TDRD9 locus resided in regions that physically interacted with the promoters of cis-genes, we evaluated chromatin interactions in lower airway (bronchial) epithelial cells (BECs) 71 , assessed by promoter-capture Hi-C. Forty-two of the GWAS variants resided in regions that interacted with the promoters of 9 different genes expressed in NECs (Fig 7; Table S5). The gene most frequently mapped to these variants was PPP1R13B, with 15 variants located in 3 different interaction loops.
Moreover, the strongest observed interaction was between a region containing 4 FEV1-associated variants and the PPP1R13B promoter (CHiCAGO score = 9.38; Table S5), suggesting that this region is an enhancer for PPP1R13B expression. This putative enhancer region is located just 2.21 kb from cg03306306.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 27, 2022.

Fig 7. Promoter-enhancer interactions at TDRD9 locus in nasal epithelial cells.
Promoter-to-enhancer chromatin interactions captured by Hi-C in nasal epithelial cells from URECA at age 11 are displayed as grey arcs. SNPs associated with FEV1 (p<1x10 -5 ) are marked by blue lines in the top row according to their genomic position. The lead FEV1 SNP, rs1022464, is highlighted in yellow. CpG sites associated with rs1022464 (FDR<0.05) are displayed as green markers below the genes, with cg03306306 highlighted in green. Chromatin Interactions containing SNPs associated with FEV1 (p<1x10 -5 ) are highlighted in blue. Magenta arcs highlight interactions between the PPP1R13B promoter and regions containing FEV1 SNPs and/or rs1022464-associated CpG sites. FEV1, forced expiratory volume in one second; SNPs, single nucleotide polymorphisms; meQTL, methylation quantitative trait locus; pcHi-C, promoter capture Hi-C.

DISCUSSION
Using whole-genome sequence variant calls in an asthma-enriched cohort of predominantly African-American children raised in urban environments, we identified a genotype at the TDRD9 locus associated with lower FEV1 % predicted. This genotype effect was partially mediated by DNA methylation in airway epithelial cells, which were also correlated with smoking exposure. Data from RNA-sequencing and promoter-capture Hi-C in airway epithelial cells suggested that these FEV1associated genetic and epigenetic variations influence the expression of the PPP1R13B gene through long-range interactions.
The PPP1R13B gene encodes a protein that promotes apoptosis, a form of programmed cell death, via its interaction with the tumor suppressor p53 and is often referred to by its alias ASPP1 (apoptosis-stimulating protein of p53 1) 72 . In response to oncogenic stress, PPP1R13B translocates to the nucleus, where it enhances the transcriptional activity of p53 on specific target genes relevant to apoptosis 73,74 . Exposure to smoking and fine particulate matter induces epithelial apoptosis in the lung via p53 [75][76][77] . PPP1R13B may also promote apoptosis in a p53-independent manner by inhibiting autophagy in response to upregulation by EGR-1 (early growth response protein 1) 78 . EGR-1 mediates stress-induced proinflammatory responses in the airway epithelium and contributes to the pathogenesis of COPD [79][80][81][82][83][84] . Within the lung, PPP1R13B is indeed predominantly expressed in epithelial cells, particularly in alveolar type 2 cells, and less so in immune cells and fibroblasts 85,86 .
However, Cheng and colleagues studied PPP1R13B function in lung fibroblasts and found that it was upregulated following SiO2 exposure, where it promoted fibroblast proliferation and migration through endoplasmic reticulum stress and autophagy pathways 87 . Overall, these studies suggest that PPP1R13B plays a key role in maintaining tissue homeostasis by regulating apoptosis and autophagy in response to environmental stimuli 73,88,89 . The specific function(s) of this gene in the airway epithelium and its potential impact on lung development remain to be elucidated. PPP1R13B expression was not directly associated with lung function or urine cotinine levels in the URECA children, but its cofactors 78,90 have been found previously to be upregulated in smokers with COPD 80,91 . Given its association with lung function alleles in our study, its expression in the lung epithelium, and its purported functions in autophagy and apoptosis pathways, additional study of PPP1R13B in lung development is warranted, particularly in the context of adverse environmental stimuli, many of which are enriched in low-income urban environments.
In NECs, PPP1R13B gene expression was significantly associated with DNA methylation levels at the cg03306306 CpG site in TDRD9. Methylation at the TDRD9 locus was previously reported to correlate with specific environmental exposures 66,67,92 and with TDRD9 expression in blood 66,93 . TDRD9 is lowly expressed in the lung but is detected in alveolar macrophages and in monocytes 85,86 . Interestingly, the gene was among the most differentially expressed genes in alveolar . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 27, 2022. macrophages in smokers relative to non-smokers 94 , and its knockdown in TDRD9-expressing lung carcinomas resulted in increased apoptosis 95 . Its expression was not correlated with the rs10220464 genotype in URECA NECs or PBMCs, but rs10220464 is an eQTL for TDRD9 expression in whole blood in GTEx data 96 , with the minor allele associated with lower TDRD9 expression. Although evidence from this study points to PPP1R13B in the airway epithelium, we can't exclude the possibility that TDRD9 or other genes could contribute to the locus' influence on lung function via other tissues.
The correlations of FEV1 and rs10220464 with cg03306306 methylation and PPP1R13B expression in NECs were absent in PBMCs. Although global DNA methylation patterns between tissues are highly correlated 97 , tissue-specific differentially methylated regions are more likely to be functional, particularly if they are positively correlated with gene expression 98 . The TDRD9 locus has not been identified in epigenome-wide association studies of lung function 44,99-103 , but these measured DNA methylation from blood, which may be an insufficient proxy for methylation in the lungs 104 .
Indeed, previous studies have found that DNA methylation profiles in NECs are significantly more predictive of pediatric asthma than those in PBMCs 105,106 . Furthermore, epigenetic biomarkers can change with age. For example, epigenetic markers for lung function in adults do not replicate in children 100 .

Mendelian randomization and mediation analyses indicated that methylation at cg03306306
in NECs mediated the rs10220464 genotype effect on FEV1, but there was residual correlation between rs10220464 and FEV1, signifying that the genotype effect was only partially mediated by cg03306306. The FEV1 association signal at the TDRD9 locus included many variants in high LD across a 200 kb region that could be independently contributing to function. Some of the variants lie in different long-range enhancers (Fig 7; Table S5) 58 . It is also possible that one or more correlated variants were not included because they failed quality control standards. In addition, due to the limited sample size of the WGS cohort, we excluded rare variants (MAF<0.01) from consideration, which could contribute to the signal at this locus. Additional fine mapping and/or functional studies are needed to identify the causal variant(s) and full mechanism of action.
If increased methylation at cg03306306 leads to lower FEV1 in response to environmental stressors, one might expect to see interaction effects with smoking exposure on FEV1. We did not detect any such interactions, although our analyses in that regard were likely underpowered given our observed effects and sample sizes 36 . Furthermore, because this study was limited to children living in low-income urban neighborhoods, environmental risk factors are likely to be more prevalent than in the general population 55-57 . Additionally, such exposures are not necessarily ubiquitous across all the different neighborhoods and communities represented in this sample, and although environmental tobacco smoke exposure was examined, many other environmental factors were not considered in this study.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 27, 2022. ; https://doi.org/10.1101/2022.06.24.22276830 doi: medRxiv preprint Another limitation of our study was the relatively small size for a GWAS. This likely contributed to the lack of statistically significant replication for previously identified lung function loci 23 , considering that the observed effects were correlated with results of prior GWAS. However, the APIC and URECA cohorts represent understudied, high-risk, pediatric populations that likely harbor distinct genetic and environmental risk factors compared to older, primarily European ancestry cohorts included in previous GWAS of lung function 14-20,23 . The findings of this study have yet to be replicated in an independent cohort, and should therefore be considered preliminary; however, it is possible that these associations would differ in populations with dissimilar ancestry, age, exposures, and/or asthma risk.
There are additional caveats to consider when interpreting our findings. First, this study integrated data from two cohorts with different recruitment criteria, asthma definitions, and ancestral compositions. To control for potential population stratification, we used the first ten principal components (PCs) of ancestry to adjust lung function values and then included the ancestry PCs as fixed effects in the GWAS models (Methods). The linear mixed models also included a genetic relatedness matrix as a random effect to account for residual population structure. Because children with asthma have lower lung function overall ( Table 1) and their lung function is more affected by certain environmental exposures 107-109 , we adjusted for asthma status in the GWAS, as others have done previously 110-113 . The likelihood of discovering lung function variants with consistent effects in asthmatics and non-asthmatics was thereby increased, although genetic determinants of lung function may differ by asthma status 114 . The significant genotype effect at the TDRD9 locus, however, remained when asthma was excluded as a covariate. Second, some of the analyses used data collected at different timepoints. For example, most of the urine cotinine and spirometry measures were collected at age 10, but the samples used for the NEC DNA methylation and RNA-seq analyses were collected at age 11. Because DNA methylation and gene expression can change over time 40,115-117 , their values at age 11 may not be fully representative of exposures at age 10. Finally, the promotercapture Hi-C data were from lower airway (bronchial) epithelial cells, whereas the DNA methylation and RNA-seq data were generated from upper airway (nasal) epithelial cells. Although there are transcriptomic differences between epithelial cells from each compartment, their respective profiles are highly correlated 118-122 , and the use of NECs as a proxy for the lower airway epithelium has been validated for both gene expression and epigenetic studies 120-123 .
Our study identified a novel avenue through which genetic risk and environmental exposures can affect lung development in children raised in low-income urban neighborhoods. Further research into this pathway may yield mechanistic insights into the early development of impaired lung function, perhaps leading to interventions that can help reduce the high incidence and morbidity of chronic respiratory diseases in socioeconomically disadvantaged children.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Study Population and Phenotypes
We analyzed samples and phenotypes from two National Institutes of Allergy and Infectious Diseases (NIAID)-funded asthma studies conducted by the Inner-City Asthma Consortium Lung function was assessed using spirometry. Lung function measures used in this study for APIC participants were taken at the study entry visit (V0). For URECA, measurements from age 10 were used when available; otherwise, the most recent measurement after age 5 was used (Table S6).
Asthma status was assigned according to study-specific criteria. For APIC, asthma was defined by a doctor's diagnosis of asthma and short-acting beta-agonist use in the year prior 55 . For URECA, asthma status was determined either by doctor diagnosis, lung function reversibility, or symptom recurrence 125 . The 2012 Global Lung Initiative reference equations 126 were applied to generate percent predicted estimates for FEV1 and Z-scores for FEV1/FVC ratio. Urine cotinine levels were measured using NicAlert immunochromatographic assays, which report results on a scale of 0-6 according to different cotinine concentration ranges 127 . For URECA, urine cotinine results were available at age 10 for most participants (n=391); otherwise, assays from age 8 (n=29) or age 7 (n=2) were used. This study utilized DNA methylation and RNA-seq data generated for other URECA studies; therefore, the number of samples included in each analysis varied and was limited by data availability (Table S7).
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Whole-Genome Sequencing and Data Processing
DNA was extracted from peripheral blood (APIC, URECA) or cord blood (URECA) and quantified using an Invitrogen Qubit 3 Fluorometer. DNA quality was assessed using the Thermo Scientific NanoDrop One spectrophotometer and confirmed using an Agilent TapeStation system. DNA was processed in batches of 60 using the Illumina Nextera DNA Flex library prep kit with unique dual adaptors. Each set of 60 libraries was sequenced over two NovaSEQ S4 flowcells. Whole-genome sequencing was performed by the University of Chicago Genomics Facility using the Illumina NovaSEQ6000, which generated 150 bp paired-end reads. Sequencing data processing followed the Broad Institute's Genome Analysis Toolkit (GATK) best practices for germline short variant discovery, as implemented in the harmonized pipeline used by the New York Genome Center for TOPMed 128,129 . Reads were aligned to the GRCh38 human reference genome (including alternate loci and decoy contigs) using BWA-MEM (Burrows-Wheeler Aligner; v0.7.17). Aligned reads further underwent duplicate removal (Picard MarkDuplicates; v2.8.1) and base quality score recalibration (GATK BaseRecalibrator; v3.8) against known sites (dbSNP138, known indels, and Mills and 1KG gold standard indels) provided in the GATK resource bundle. Read alignment metrics were calculated using Picard CollectWgsMetrics (v2.8.1) for all aligned reads and for aligned reads with base quality and mapping quality ≥ 20. DNA contamination levels were estimated using VerifyBamID2 (v1.0.6) 130 . Samples with estimated DNA contamination >0.05 were removed from consideration. Samples with poor coverage (<50% of the genome with ≥20x depth) were also removed from further consideration. To identify potential sample swaps, WGS samples were validated using independent genotyping arrays.

QC Array for Sample Validation
To identify potential WGS sample swaps, we independently genotyped the APIC and URECA participants using the Illumina QC Array-24 BeadChip. SNPs were tested for Hardy-Weinberg Equilibrium (HWE) within each self-identified ancestry group using the chi-square test and removed if they deviated from HWE (Bonferroni-adjusted p<0.05) within at least one ancestry. SNPs with call rates <0.98 were also removed. Samples with total variant call rates <0.95 were not used. Array data with incorrect or indeterminate sex according to X-chromosome heterozygosity rates (Plink v1.90) were also not used 131 . For fourteen of the sequenced URECA samples, we used results from the Illumina Infinium CoreExome+Custom array for sample validation, which were generated and controlled for quality as described by McKennan and colleagues 132 . WGS and array genotypes were tested for concordance using VerifyBamID (v1.1.3) 133 . WGS samples that were not validated with array data were not included in genetic analyses.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Variant Calling and Quality Control
Variant calls were generated using GATK HaplotypeCaller (v4.1.3.0), accounting for contamination estimates, for single nucleotide variants and short insertions, deletions, and substitutions. Sample genotypes were joined using GATK GenomicsDBImport and GenotypeGVCFs over the genomic intervals defined in the GATK WGS calling region interval list provided in the GATK resource bundle. Genotypes with read depth (DP) <10 or quality scores (GQ) <20 were set as missing. Sites with ≥0.1 missingness were then removed from consideration. Variants with minor allele frequencies >0.05 were tested for accordance with HWE, accounting for population structure 134 . Sites with common variants that deviated from structural HWE (P<1x10 -6 ) were removed from consideration.
Sites with quality by depth ratios (QD) <4 or >34 were also removed, as we observed declines in variant transition/transversion (TS/TV) ratios beyond these bounds (Fig S10). Variant site quality was further evaluated using machine-learning-based Variant Quality Score Recalibration (VQSR).

Ancestry Estimation
Ancestry principal components (PCs) were calculated on the intersect of high quality singlenucleotide variants (SNVs) genotyped in the WGS data and several reference panels from the 1000 Genomes Project (1KG; n=156) 137  intervals. Ancestry PCs were calculated, accounting for subject relatedness, using PC-Air 140 and PC-Relate 141 . Initial kinship estimates were produced using KING 142 . Kinship and PCs were iteratively estimated using PC-Relate and PC-Air, respectively, until estimates for the top 5 PCs stabilized . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 27, 2022.
Because sample relatedness can lead to biased admixture estimates 140,143 , admixture was estimated for each WGS sample separately.

Quantitative Trait Association Testing
Quantitative traits were adjusted for confounding factors and normalized using a two-stage approach 144,145 . First, each trait was regressed on age, sex, asthma status, and the first 10 PCs of ancestry. The residuals were then rank-normalized using an inverse normal transformation. In the second stage, the normalized residuals were considered outcome variables in the GWAS, adjusting for the same covariates as in the first stage. Genome-wide association testing was performed for all high-quality common variant calls (MAF≥0.01) using a linear mixed model, as implemented in GEMMA 146 , with subject relatedness included as a random effect. Individuals without a confirmed asthma status (n=127) were excluded from trait association testing. The threshold we applied for genome-wide significance was P≤2.5x10 -8 , based on a 5x10 -8 GWAS threshold and further accounting for two tests.

DNA Methylation Analysis
DNA from NECs was collected at age 11 from 287 URECA participants and assessed for genomewide methylation patterns using the Illumina Infinium Human Methylation EPIC Beadchip. DNA methylation levels from PBMCs at age 7 in URECA were collected and processed as previously described 65 . MeQTL analysis was performed using Matrix eQTL 147 . NEC DNA methylation levels were adjusted globally for sex, array, plate, collection site, DNA concentration, percent ciliated epithelial cells, percent squamous cells, and ancestry PCs 1-3. Principal components analysis was then performed on the residual methylation levels, and the first three PCs were included as covariates in the meQTL association tests. Additional methylation PCs were not included in association tests, as they were significantly correlated with asthma phenotypes. Associations with FDR-adjusted P<0.05 were considered significant. MeQTL analysis with the PBMC data included sex, collection site, plate, ancestry PCs 1-3, and eight latent factors 148 (protecting for FEV1 at age 7) as covariates.
To test CpG site methylation associations with lung function in NECs, we performed linear regressions on the most recent FEV1 measures, with age, sex, ancestry PCs 1-3, and methylation PCs 1-3 as covariates. For the PBMC analysis, we set FEV1 at age 7 as the dependent variable, with sex, collection site, plate, ancestry PCs 1-3, and latent factors included as covariates.
For association testing with smoking exposures, we ran linear regressions for DNA methylation and lung function in NECs and PBMCs, as described above, with the addition of . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 27, 2022. ; https://doi.org/10.1101/2022.06.24.22276830 doi: medRxiv preprint cotinine concentrations as a predictor. We further tested for smoking-by-genotype interaction effects on DNA methylation and lung function using these models by adding an interaction term (cotinine concentration : rs10220464 genotype). Proportions of explained variance were calculated by squaring partial correlation coefficients of regression model predictors 149 . One sample from one sibling pair was removed from all methylation analyses to prevent confounding due to relatedness.

Mendelian Randomization and Mediation Analysis
To assess the causal effects of DNA methylation on lung function, we performed one-sample Mendelian randomization analysis. We applied a two-stage least squares (2SLS) regression to URECA samples with WGS and DNA methylation data (n=285) using ivreg 150 . DNA methylation levels in NECs at the cg03306306 CpG site were first adjusted for methylation PCs 1-3 and used as the endogenous, exposure variable. The adjusted and normalized FEV1 values from the GWAS were set as the dependent outcome variable. Urine cotinine levels were included as an exogenous covariate (included in both stages). The instrumental variables were chosen from a set of candidate SNPs that were at least nominally associated with cg03306306 methylation with p<0.15. Clustering of pairwise linkage disequilibrium values between these SNPs revealed six distinct haplotypes (Fig S9). To ensure instrument exogeneity, each candidate SNP was tested for association with FEV1 after conditioning on cg03306306 methylation and urine cotinine, and SNPs associated with p<0.05 were removed from consideration. Of the remaining candidate SNPs, one was chosen from each haplotype, resulting in an instrument composed of 4 SNPs (rs11160777, rs137961671, rs7143936, rs11160776). Instrument relevance was validated using the F test, endogeneity using the Wu-Hausman test, and instrument exogeneity using the Sargon test. Mediation analysis was conducted with ROBMED 151 . FEV1 (% predicted) was set as the dependent variable, adjusted cg03306306 methylation as the mediator, and rs10220464 as the independent variable. Age at FEV1 measurement, sex, asthma status, ancestry PCs 1-3, and urine cotinine levels were included as covariates. The indirect effect of rs10220464 on FEV1 via cg03306306 methylation was estimated using 100,000 bootstrap resamples.

Gene Expression Analysis
We analyzed gene expression in NECs and PBMCs from the URECA birth cohort using RNA-seq.
The NEC data were derived from 324 children (156 females, 168 males) at age 11 years at the time of sample collection, and the PBMC data were derived from 132 (78 males, 54 females) PBMC children aged 2 years at the time of collection. Sequencing reads were mapped and quantified using STAR (v2.6.1) 152 and samples underwent trimmed means of M-value (TMM) normalization and voom transformation 153 . Genes with <1 count per million mapped reads (CPM) were removed from . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 27, 2022. ; https://doi.org/10.1101/2022.06.24.22276830 doi: medRxiv preprint analysis. For eQTL association testing in NECs we corrected for sex, the first three ancestral PCs, collection site, epithelial cell proportion, sequencing batch, and 14 latent factors 148 using limma 154 . In PBMCs, we corrected for sex, the first three ancestral PCs, collection site, and 19 latent factors.

Chromatin Interaction Analysis
Chromatin interactions were assessed using promoter capture Hi-C 155,156 in ex vivo human BECs from 8 adult lung donors, including 4 with asthma. The data was processed and analyzed as previously described 71,157 . Chromosomal interactions were evaluated using the CHiCAGO algorithm 158 . Interactions with CHiCAGO scores ≥5 were considered significant 158 . Genetic variants within 1 kb of a given interacting fragment were considered part of the chromatin loop. Genes that were not expressed in NECs were not included in the analysis.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 27, 2022.  . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 27, 2022. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 27, 2022. ; . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 27, 2022. ; . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 27, 2022. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 27, 2022. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 27, 2022.  . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.   . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.     . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)  . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 27, 2022. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)   . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 27, 2022.

Fig S9. Intercorrelation of Mendelian randomization candidate instrument SNPs in URECA.
Instrumental variables were chosen from a set of candidate SNPs that were at least nominally associated with cg03306306 methylation with p<0.15. The correlation values between these SNPs are shown, clustered using Ward's method. The four SNPs used for the instrument are highlighted. URECA, Urban Environment and Childhood Asthma.  rs10220464  rs72712900  rs55938939  rs28391043  rs145867870  rs8010286  rs12147936  rs4906397  rs4906399  rs11160777  rs7160557  rs9671921  rs4900607  rs12100528  rs10132127  rs10132153  rs10135296  rs61319604  rs72714904  rs4906401  rs4906402  rs1951585  rs11160776  rs10133836  rs2368994  rs7143936  rs11625697  rs2005081   . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 27, 2022. ; https://doi.org/10.1101/2022.06.24.22276830 doi: medRxiv preprint