Novel KITLG regulatory variants are associated with lung function in African American children with asthma ========================================================================================================== * Angel CY Mak * Satria Sajuthi * Jaehyun Joo * Shujie Xiao * Patrick M Sleiman * Marquitta J White * Eunice Y Lee * Benjamin Saef * Donglei Hu * Hongsheng Gui * Kevin L Keys * Fred Lurmann * Deepti Jain * Gonçalo Abecasis * Hyun Min Kang * Deborah A. Nickerson * Soren Germer * Michael C Zody * Lara Winterkorn * Catherine Reeves * Scott Huntsman * Celeste Eng * Sandra Salazar * Sam S Oh * Frank D Gilliland * Zhanghua Chen * Rajesh Kumar * Fernando D Martínez * Ann Chen Wu * Elad Ziv * Hakon Hakonarson * Blanca E Himes * L Keoki Williams * Max A Seibold * Esteban G. Burchard ## ABSTRACT Baseline lung function, quantified as forced expiratory volume in the first second of exhalation (FEV1), is a standard diagnostic criterion used by clinicians to identify and classify lung diseases. Using whole genome sequencing data from the National Heart, Lung, and Blood Institute TOPMed project, we identified a novel genetic association with FEV1 on chromosome 12 in 867 African American children with asthma (p = 1.26 × 10−8, β = 0.302). Conditional analysis within 1 Mb of the tag signal (rs73429450) yielded one major and two other weaker independent signals within this peak. We explored statistical and functional evidence for all variants in linkage disequilibrium with the three independent signals and yielded 9 variants as the most likely candidates responsible for the association with FEV1. Hi-C data and eQTL analysis demonstrated that these variants physically interacted with *KITLG* and their minor alleles were associated with increased expression of *KITLG* gene in nasal epithelial cells. Gene-by-air-pollution interaction analysis found that the candidate variant rs58475486 interacted with past-year SO2 exposure (p = 0.003, β = 0.32). This study identified a novel protective genetic association with FEV1, possibly mediated through *KITLG*, in African American children with asthma. KEYWORDS * GWAS * African American * FEV1 * gene-by-environment interaction * air pollution ## INTRODUCTION Asthma, a chronic pulmonary condition characterized by reversible airway obstruction, is one of the hallmark diseases of childhood in the United States (World Health Organization 2017). Asthma is also the most disparate common disease in the pediatric clinic, with significant variation in prevalence, morbidity, and mortality among U.S. racial/ethnic groups (Oh et al. 2016). Specifically, African American children carry a higher asthma disease burden compared to their European American counterparts (Akinbami et al. 2014; Akinbami 2015). Forced expiratory volume in the first second (FEV1), a measurement of lung function, is a vital clinical trait used by physicians to assess overall lung health and diagnose pulmonary diseases such as asthma (Johnson and Theurer. 2014). We have previously shown that genetic ancestry plays an important role in FEV1 variation and that African Americans have lower FEV1 compared to European Americans regardless of asthma status (Kumar et al. 2010; Pino-Yanes et al. 2015). The disparity in lung function between populations may explain disparities in asthma disease burden. Understanding the factors that influence FEV1 variation among individuals with asthma could lead to improved patient care and therapeutic interventions. Twin and family-based studies estimate that the heritability of FEV1 is as high as 81%, supporting a strong contribution by genetic factors in FEV1 variation (Chatterjee and Das. 1995; Hukkinen et al. 2011; Palmer et al. 2001; Sillanpaa et al. 2017; Tian et al. 2017; Yamada et al. 2015). Genome-wide association studies (GWAS) of FEV1, including among individuals with asthma, have identified several variants that contribute to lung function (Li et al. 2013; Liao et al. 2014; Repapi et al. 2010; Soler Artigas et al. 2011; Soler Artigas et al. 2015; Wain et al. 2017). Most of these previous GWAS, however, were performed in adult populations of European descent, and their results may not generalize across populations or across the life span of an individual (Carlson et al. 2013; Martin, A. R. et al. 2017; Wojcik et al. 2019). Previous GWAS results are also limited due to their reliance on genotyping arrays. In particular, variation in non-coding regions of the genome is not adequately covered by many genotyping arrays because they were not designed while taking into account the population-specific genetic variability of all populations (Kim, M. S. et al. 2018; Zhang and Lupski. 2015). Whole genome sequencing (WGS) is a newer technology that captures nearly all common variation from coding and non-coding regions of the genome and is unencumbered by genotype array design constraints and differences in linkage disequilibrium patterns among populations. To date, no large-scale WGS studies of lung function have been performed in African American children with asthma (Martin et al. 2017). In addition to genetics, FEV1 is a complex trait that is significantly influenced by both genetic variation and environmental factors, such as air pollution (Chatterjee and Das. 1995; Hukkinen et al. 2011; Palmer et al. 2001; Sillanpaa et al. 2017; Tian et al. 2017; Yamada et al. 2015). Exposure to ambient air pollution has been consistently associated with poor respiratory outcomes, including reduced FEV1 (Barraza-Villarreal et al. 2008; Brunekreef and Holgate. 2002; Ierodiakonou et al. 2016; Wise 2019). We previously showed that exposure to sulfur dioxide (SO2), an air pollutant emitted by the burning of fossil fuels, is significantly associated with reduced FEV1 in African American children with asthma in the SAGE II study (Neophytou et al. 2016). Because the genetic variants associated with FEV1 thus far do not account for the majority of its estimated heritability, considering gene-environment interactions, specifically gene-by-air-pollution, may improve our understanding of lung function genetics (Moore 2005; Moore and Williams. 2009). Here, we performed a genome-wide association analysis using WGS data to identify common genetic variants associated with FEV1 in African American children with asthma in SAGE II and investigated the effect of gene-by-air-pollution (SO2) interactions on FEV1 associations. ## METHODS ### Study population This study examined African American children between 8-21 years of age with physician-diagnosed asthma from the Study of African Americans, Asthma, Genes & Environments (SAGE II). All SAGE II participants were recruited from the San Francisco Bay Area. The inclusion and exclusion are previously described in detailed (Oh et al. 2012; White et al. 2016). Briefly, participants were eligible if they were 8-21 years of age and self-identified as African American and had four African American grandparents. Study exclusion criteria included the following: 1) any smoking within one year of the recruitment date; 2) 10 or more pack-years of smoking; 3) pregnancy in the third trimester; 4) history of lung diseases other than asthma (for cases) or chronic illness (for cases and controls). Baseline lung function defined as forced expiratory volume in the first second (FEV1) was measured by spirometry prior to administering albuterol as previously described (Oh et al. 2012). ### TOPMed whole genome sequencing data SAGE II DNA samples were sequenced as part of the Trans-Omics for Precision Medicine (TOPMed) whole genome sequencing (WGS) program (Taliun et al. 2019). WGS was performed at the New York Genome Center and Northwest Genomics Center on a HiSeq X system (Illumina, San Diego, CA) using a paired-end read length of 150 base pairs (bp), with a minimum of 30x mean genome coverage. DNA sample handling, quality control, library construction, clustering and sequencing, read processing and sequence data quality control are described in detail in the TOPMed website (TOPMed 2019). Variant calls were obtained from TOPMed data freeze 8 VCF files corresponding to the GRCh38 assembly. Variants with a minimal read depth of 10 (DP10) were used for analysis unless otherwise stated. ### Genetic principal components, global ancestry, and kinship estimation Genetic principal components (PCs), global ancestry, and kinship estimation on genetic relatedness were computed using biallelic single nucleotide polymorphisms (SNPs) with a PASS flag from TOPMed freeze 8 DP10 data. PCs and kinship estimates were computed using the PC-Relate function from the GENESIS R package (Conomos et al. 2015; Conomos et al. 2016) using a workflow available from the Summer Institute in Statistical Genetics Module 17 course website (Summer Institute in Statistical Genetics 2019). African global ancestry was computed using the ADMIXTURE package (Alexander et al. 2009) in supervised mode using European (CEU), African (YRI) and Native American (NAM) reference panels as previously described (Mak, A. C. Y. et al. 2018). ### FEV1 GWAS Non-normality of the distribution of FEV1 values was tested with the Shapiro-Wilk test in R using the shapiro.test function. Since FEV1 was not normally distributed (p = 1.41 × 10−8 for FEV1 and p = 1.05 × 10−8 for log10 FEV1), FEV1 was regressed on all covariates (Age, sex, height, controller medications, sequencing centers, and the first 5 genetic PCs) and the residuals were inverse-normalized. These inverse-normalized residuals (FEV1.res.rnorm) were the main outcome of the discovery GWAS. The controller medication covariate included the use of inhaled corticosteroids (ICS), long-acting beta-agonists (LABA), leukotriene inhibitors and/or an ICS/LABA combo in the 2 weeks prior to the recruitment date. Genome-wide single variant analysis was performed on the ENCORE server ([https://github.com/statgen/encore](https://github.com/statgen/encore)) using the linear Wald test (q.linear) originally implemented in EPACTS ([https://genome.sph.umich.edu/wiki/EPACTS](https://genome.sph.umich.edu/wiki/EPACTS)) and TOPMed freeze 8 data (DP0 PASS) with a MAF filter of 0.1%. All pairwise relationships with degree 3 or more relatedness (kinship values > 0.044) were identified, and one participant of the related pair was subsequently chosen at random and removed prior to analysis. All covariates used to obtain FEV1.res.rnorm were also included as covariates in the GWAS as recommended in a recent publication (Sofer et al. 2019). The association analysis was repeated using untransformed FEV1 and FEV1 percent predicted (FEV1.perc.predicted). FEV1 percent predicted was defined as the percentage of measured FEV1 relative to predicted FEV1 estimated by the Hankinson lung function prediction equation for African Americans (Hankinson et al. 1999). A secondary analysis that included smoking-related covariates (smoking status and number of smokers in the family) was performed in PLINK 1.9 (version 1p9_2019_0304_dev) (Chang et al. 2015; Purcell and Chang. 2013). Regional association results were plotted using LocusZoom 1.4 (Pruim et al. 2010) with a 500 kilobase (Kb) flanking region. Linkage disequilibrium (R2) was estimated in PLINK 1.9. The function effectiveSize in the R package CODA was used to estimate the actual effective number of independent tests and CODA-adjusted statistical and suggestive significance p-value thresholds were defined as 0.05 and 1 divided by the effective number of tests, respectively (Duggal et al. 2008). We compared the CODA-adjusted statistical significance threshold and the widely used 5 × 10−8 GWAS genome-wide significance threshold (Pe’er et al. 2008) and selected the more stringent threshold for genome-wide significance. The following WGS quality control steps were applied to all reported variants from ENCORE to ensure WGS variant quality: (1) The variant had VCF FILTER = PASS; (2) Variant quality was confirmed via manual inspection on the BRAVO server based on TOPMed freeze 5 data (University of Michigan and NHLBI TOPMed. 2018); (3) Variants were reanalyzed with linear regression using PLINK 1.9 by applying the arguments --mac 5 --geno 0.1 --hwd 0.0001 using TOPMed freeze 8 DP10 PASS data. To determine if the rs73429450 association with FEV1 was only identifiable using whole genome sequencing data, we repeated the linear regression association analysis on signals that passed the genome-wide significance threshold using PLINK 1.9 and genotype data generated with Axiom Genome-Wide LAT 1 array (Affymetrix, Santa Clara, CA, dbGaP phs000921.v1.p1). These array genotype data were imputed into the following reference panels: 1000 Genomes phase 3 version 5, Haplotype Reference Consortium (HRC) r1.1, the Consortium on Asthma among African-ancestry Populations in the Americas (CAAPA) and the TOPMed phase 5 panels on the Michigan Imputation Server (Das et al. 2016). It should be noted that 500 SAGE II subjects were part of the TOPMed freeze 5 reference panel. Ninety-eight GWAS FEV1-associated loci were retrieved from GWAS catalog version 1.0.2-associations\_e93\_r2019-01-31 (Buniello et al. 2019) using the trait name “Lung function (FEV1)”. Overlap between these 98 loci with 20 Kb flanking regions and the discovery GWAS FEV1 associations (p < 0.001) were detected using the BEDTools intersect tool (Quinlan and Hall. 2010). ### Conditional analysis Conditional analysis was performed to identify all independent signals in a GWAS peak using PLINK 1.9. All TOPMed freeze 8 DP10 variants within 1 megabase (Mb) of the tag association signal and with association p-value of 1 × 10−4 or smaller in the discovery GWAS were included in the analysis. Variants were first ordered by ascending p-value. A variant was considered to be an independent signal if the association p-value after conditioning (conditional p-value) on the tag signal was smaller than 0.05. Newly identified independent signals were included with the tag signal for conditioning on the next variant. ### Region-based association analysis Region-based association analyses were performed in 1 Kb sliding windows with 500 bp increments in a 1 Mb flanking region of the tag GWAS signal using the SKAT_CommonRare function from the SKAT R package v1.3.2.1 (Ionita-Laza et al. 2013). Default settings were used with method = “C” and test.type = “Joint”. A minor allele frequency (MAF) threshold of 0.01 was used as the cutoff to distinguish rare and common variants. Variants were annotated in TOPMed using the WGSA pipeline (Liu et al. 2016). Since SKAT imputes missing genotypes by default by assigning mean genotype values (impute.method=”fixed”), we chose to use low coverage genotypes instead of SKAT imputation, and hence, TOPMed freeze 8 DP0 variants with a VCF FILTER of PASS were included in the analysis. The function effectiveSize in the R package CODA (Plummer et al. 2006) was used to estimate the effective number of independent hypothesis tests for accurate Bonferroni multiple testing corrections. P-value thresholds for statistical significance and suggestive significance were defined as 0.05 and 1 divided by the effective number of tests, respectively (Duggal et al. 2008). If a region was suggestively significant, region-based analyses were repeated with functional variants and/or rare variants (MAF <= 0.01) to assess contribution of common, rare and/or functional variants. Region-based analyses using rare variants only were performed using SKAT-O (Lee et al. 2012). The WGSA annotation filters used to define functional variants are provided in File S1 (Supplementary Text 1). To study the contribution of individual variants to a region-based association p-value, drop-one variant analysis was performed by repeating the region-based analysis multiple times and dropping one variant only at a time. ### Functional annotations and prioritization of genetic variants The Hi-C Unifying Genomic Interrogator (HUGIN) (Ay et al. 2014; Martin, J. S. et al. 2017; Schmitt et al. 2016) was used to assign potential gene targets to each variant. ENCODE annotations (ENCODE Project Consortium 2011; ENCODE Project Consortium 2012) were based on overlap of the variants with functional data downloaded from the UCSC Table Browser (Karolchik et al. 2004). These data included DNAase I hypersensitivity peak clusters (hg38 wgEncodeRegDnaseClustered table), transcription factor ChIP-Seq clusters (hg38 encRegTfbsClustered table) and histone modification ChIP-Seq peaks (hg19 wgEncodeBroadHistoneStdPk tables). For DNase I hypersensitivity and transcription factor binding sites, we focused on blood, bone marrow, lung and embryonic cells. For histone modification ChIP-Seq, we focused on H3K27ac and H3K4me3 modifications in human blood (GM12878), bone marrow (K562), lung fibroblast (NHLF), and embryonic stem cells (H1-hESC). LiftOver tool (Hinrichs et al. 2006) was used to convert genomic coordinates from hg19 to hg38. Candidate cis-regulatory elements (ccREs) were a subset of representative DNase hypersensitivity sites with epigenetic activity further supported by histone modification (H3K4me3 and H3K27ac) or CTCF-binding data from the ENCODE project. Overlap of variants with ccREs were detected using the Search Candidate cis-Regulatory Elements by ENCODE (SCREEN) web interface (ENCODE Project Consortium 2011; ENCODE Project Consortium 2012). Prioritization of genetic variants was based on the presence of statistical, functional and/or bioinformatic evidence as described in the Diverse Convergent Evidence (DiCE) prioritization framework (Ciesielski et al. 2014). The priority score of each variant was obtained by summing the number of evidence present for that variant. ### Replication of GWAS associations All replication analyses were performed in subjects with asthma. Replication of GWAS FEV1 associations was attempted on TOPMed whole genome sequencing data generated from four cohorts. These cohorts included Puerto Rican (n=1,109) and Mexican American (n=649) children in the Genes-Environments and Admixture in Latino Americans (GALA II) study (Oh et al. 2012), African American adults in the Study of Asthma Phenotypes and Pharmacogenomic Interactions by Race-Ethnicity (SAPPHIRE, n=3,428) (Levin et al. 2014) and African American children in Genetics of Complex Pediatric Disorders (GCPD-A, n=1,464) study (Ong et al. 2013). Age, sex, height, controller medications and the first 5 PCs were used as covariates. Additionally, replication of GWAS FEV1 associations was attempted using data of black UK Biobank subjects who had asthma (n=627) while adjusting for age, sex, height and the first 5 principal components. Asthma status was defined by ICD code or self-reported asthma. UK Biobank genotype data was generated on Affymetrix UK BiLEVE axiom or UK Biobank Axiom array and imputed into the Haplotype Reference Consortium, 1000G and UK 10K projects (Bycroft et al. 2018; Canela-Xandri et al. 2018). Additional details on the UK Biobank study and the replication procedures are available in File S1 (Supplementary Text 2). ### RNA sequencing and expression quantitative trait loci (eQTL) analysis Whole-transcriptome libraries of 370 nasal brushings from GALA II Puerto Rican children with asthma were constructed by using the Beckman Coulter FX automation system (Beckman Coulter, Fullerton, CA). Libraries were sequenced with the Illumina HiSeq 2500 system. Raw RNA-Seq reads were trimmed using Skewer (Jiang et al. 2014) and mapped to human reference genome hg38 using Hisat2 (Kim, D. et al. 2015). Reads mapped to genes were counted with htseq-count and using the UCSC hg38 GTF file as reference (Anders et al. 2015). Cis-expression quantitative trait locus (eQTL) analysis of *KITLG* was performed as described in the Genotype-Tissue Expression (GTEx) project version 7 protocol (GTEx Consortium et al. 2017) using age, sex, BMI, global African and European ancestries and 60 PEER factors as covariates. ### Gene-by-air-pollution interaction analysis We hypothesized that the effect of genetic variation on lung function in our study population may differ by the levels of exposure to SO2 (Neophytou et al. 2016). To test for an interaction between a genetic variant and SO2, an additional multiplicative interaction term (variant × S02 exposure) was included in the original GWAS model (see Method Section “FEV1 GWAS”). The SO2 estimates used in the interaction analysis were first-year, past-year, and lifetime exposure to ambient of SO2, which were estimated as described previously (Neophytou et al. 2016). Residuals of FEV1 were plotted against exposure to SO2 and stratified by the number of copies of the minor allele. Residuals of FEV1 were obtained as described in the Methods Section “FEV1 GWAS”. ### Data availability Local institutional review boards approved the studies (IRB# 10-02877). All subjects and legal guardians provided written informed consent. TOPMed whole genome sequencing data are available on dbGaP with the accession number: phs000921.v4.p1. The remaining sequence data are available upon request. Supplemental materials available at figshare. ## RESULTS ### Novel lung function associations Subject characteristics of the 867 African American children with asthma included in this study are shown in Table 1, and the distribution of their FEV1 values is in Figure S1. The CODA-adjusted statistical significance threshold (2.10 × 10−8) was selected as the genome-wide significance level. According to this threshold, one SNP in chromosome 12 (chr12:88846435, rs73429450, G>A) was associated with FEV1 .res.rnorm (Figure 1, p = 9.01 × 10−9, β = 0.801). The association between rs73429450 and lung function remained statistically significant when the association was repeated using untransformed FEV1 (p = 1.26 × 10−8, β = 0.302) as the outcome variable. The association between rs73429450 and lung function was suggestive using FEV1 .perc.predicted (p = 1.69 × 10−7, β = 0.100). Thirty-nine variants with association p < 0.001 overlapped with 14 out of 98 previously reported FEV1-associated loci and their 20 Kb flanking regions (Table S1). When including all variants with association p < 0.05, we found overlap with all 98 previously reported loci. Our top FEV1 association, rs73429450, did not overlap with any previously reported loci in both scenarios and it is a novel association with FEV1 in this study population. View this table: [Table 1.](http://medrxiv.org/content/early/2020/02/23/2020.02.20.20019588/T1) Table 1. Descriptive characteristics of 867 African American children with asthma included in this study. ![Figure 1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/02/23/2020.02.20.20019588/F1.medium.gif) [Figure 1.](http://medrxiv.org/content/early/2020/02/23/2020.02.20.20019588/F1) Figure 1. Manhattan and LocusZoom plots from genome-wide association study of lung function*. (A) Manhattan plot from genome-wide association study of lung function* using linear regression in ENCORE. Red horizontal line: CODA-adjusted genome-wide significance p-value of 2.10 × 10-8. Blue horizontal line: CODA-adjusted suggestive significance p-value of 4.19 × 10-7. (B) LocusZoom plot of rs73429450 (chr12 : 88846435) and 500 Kb flanking region. Colors show linkage disequilibrium in the study population. * FEV1.res.rnorm was used as the phenotype for the association testing. Secondary analysis that included covariates correcting for smoking status and number of smokers in the family showed that smoking-related factors were not significantly associated with FEV1 in our pediatric SAGE cohort: using 657 out of 867 individuals with available smoking-related covariates, the FEV1.res.rnorm association p-values before and after including the smoking-related covariates were 2.01 × 10−6 and 1.89 × 10−6 Both p-values of the covariates smoking status (p = 0.27) and number of smokers in the family (p = 0.54) were not significant. Conditional analysis was performed on 45 variants with association p < 1 × 10−4 located within 1 Mb of the strongest association signal (rs73429450). Two weaker independent signals (rs11312747, rs58475486) were identified (Table S2). The minor allele frequency of rs73429450 in continental populations from the 1000 Genomes Project (1000G) is 3% in Africans (AFR) and < 1% in Admixed Americans (AMR), Europeans (EUR) and Asians (EAS and SAS) (1000 Genomes Project Consortium et al. 2015). Rs73429450 was not included on the Affymetrix LAT1 genotyping array where SAGE participants were previously genotyped. To determine if the rs73429450 association with FEV1 was only identifiable using whole genome sequencing data, we attempted to reproduce our results by imputing the genotype of rs73429450 in 851 SAGE participants with available array data using 1000G phase 3 (n = 2,504), HRC r1.1 (n = 32,470), CAAPA (n = 883) and TOPMed freeze 5 (n = 62,784) reference panels. Our results remained statistically significant when using the 1000G phase 3 (p = 4.97 × 10−8, β = 0.79, imputation R2 = 0.95) and TOPMed freeze 5 (p = 1.22 × 10−8, β = 0.80, imputation R2 = 0.98) reference panels, but lost statistical significance when rs73429450 genotypes were imputed using the HRC (p = 4.35 × 10−7, β = 0.68, imputation R2 = 0.94) and CAAPA (p = 1.95 × 10−7, β = 0.80, imputation R2 = 0.71) reference panels. Region-based association analysis including all variants conditioned on the association signal from rs73429450 was performed in its 1 Mb flanking region (chr12:87846435-89846435). No windows were significantly associated after Bonferroni multiple testing correction (p < 2.81 × 10−4, Figure S2), but 20 windows were suggestively associated with FEV1.res.rnorm (p < 5.62 × 10−3, Table S3). Two of 20 windows re-tested using only functional variants were suggestively significant (region 4 and 16). Both of these windows were no longer suggestively significant after removing the common variants, indicating that association signal from these regions was mostly driven by common variants. Further investigation on region 16 using drop-one analysis on the 2 rare and 1 common function variants confirmed the major contribution by the common variant, rs1895710, as shown by the major increase in p-value (Table S4). The signal was also slightly driven by the singleton, rs990979778. Drop-one analysis was not performed on region 4 because there were only 1 common and 1 rare variants. A Hi-C assay couples a chromosome conformation capture (3C) assay with next-generation sequencing to capture long-range interactions in the genome. We identified a statistically significant long-range chromatin interaction between the GWAS peak and the KIT ligand (*KITLG*, also known as stem cell factor, *SCF*) gene in human fetal lung fibroblast cell line IMR90 (Table S5). The long-range interaction detected in human primary lung tissue was not significant, implying that the potential long-range interactions are specific to tissue type or developmental stage. ### Potential regulatory role of FEV1-associated variants on KITLG expression To further elucidate potential regulatory relationships between the GWAS association peak and *KITLG*, we analyzed whether variants in the peak were eQTL of *KITLG* in previously published whole blood RNA-Seq data available from the same study participants (Mak, Angel CY et al. 2016). The whole blood RNA-Seq data, however, did not yield evidence of expressed *KITLG*, consistent with results in GTEx. We subsequently used RNA-Seq data from nasal epithelial cells of 370 Puerto Rican children with asthma from the GALA II study, and found that five out of 45 variants were eQTL of *KITLG* (Table S6). While Puerto Ricans are a different population than African Americans, they are both admixed populations with substantial African genetic ancestry, and therefore could share eQTLs. All five eQTLs corresponded to one signal in a region with strong linkage disequilibrium (r2 > 0.8, markers 25, 26, 28, 29 and 32 in Figure 2). ![Figure 2.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/02/23/2020.02.20.20019588/F2.medium.gif) [Figure 2.](http://medrxiv.org/content/early/2020/02/23/2020.02.20.20019588/F2) Figure 2. Integration of statistical and functional evidence for variant prioritization. Numbers and different shades of black in the LD plot represent LD in R2. The three independent signals identified in the conditional analysis are marked with *. Nasal eQTL, variants eQTL of *KITLG* in nasal epithelial cells. ccREs, candidate cis-regulatory elements in SCREEN registry. ENCORE, DNase I hypersensitivity site and/or transcription factor ChIP-Seq overlapping with the variants.UK Biobank, SAPPHIRE, GCPD-A, replication results using Blacks in UK Biobank and African Americans in the SAPPHIRE and GCPD-A cohorts (R = replicated at p < 0.05; F = flip-flop association at p < 0.05). Candidate, candidate variants prioritized because of presence of two or more evidence or is nasal eQTL. + indicates presence of evidence. ### Replication of genetic association with FEV1 Subject Characteristics of our four replication cohorts (SAPPHIRE, GCPD-A, UK Biobank and GALA II) are shown in Table S7. We attempted to replicate the association of the 45 SNPs in our primary FEV1 GWAS in each cohort. We used 0.05 as the suggestive p-value threshold and 0.0167 as the Bonferroni-corrected p-value threshold after correcting for 3 independent signals (see conditional analysis in Results Section). A total of 20 variants were replicated at p < 0.05 with consistent direction of effect in black UK Biobank participants; 14 variants in SAPPHIRE and 2 variants in GCPD-A were significant but had an opposite direction of effect (Table S8). We attempted to replicate the FEV1.res.rnorm association in Mexican American (n = 649) and Puerto Rican (n = 1,109) children with asthma from the GALA II study. In Mexican Americans, we excluded 19 variants with MAF < 0.1% and associations for the remaining 26 variants did not replicate (Table S9). In Puerto Ricans, we observed the same protective effect in 38 of the 45 variants in the locus, but the associations were not statistically significant (Table S9). ### Incorporating statistical and functional evidence for candidate variant prioritization We combined and summarized all functional evidence for the top 45 variants, along with eQTL findings from nasal epithelial RNA-Seq and replication results (Figure 2, Table 2). To facilitate interpretation of the variant association with FEV1, the effect sizes and p-values of both FEV1 (β and p) and FEV1.res.rnorm (βnorm and pnorm) associations are reported in Table 2. CADD functional prediction score and ENCODE histone modification ChIP-Seq peaks in embryonic, blood, bone marrow, and lung-related tissues were also examined but not reported because none of the variants had a CADD score greater than 10 and none overlapped with histone modification sites. After summing up the number of evidence present for each variant to obtain its priority score, rs73440122 received the highest functional score of 3 based on replication in the UK Biobank, overlap with a DNase I hypersensitivity site in B-lymphoblastoid cells (GM12865) and overlap with an SPI1 binding site in acute promyelocytic leukemia cells. Eight other variants were prioritized with score > 2 or evidence of being an eQTL for *KITLG* in nasal epithelial cells (Table 2, score marked with * or # respectively). These nine candidate variants were selected for gene-by-air-pollution interaction analyses. View this table: [Table 2.](http://medrxiv.org/content/early/2020/02/23/2020.02.20.20019588/T2) Table 2. Integration of statistical and function evidence for variant prioritization. ### Gene-by-air-pollution interaction of rs58475486 ***We previously found that first year of life and lifetime exposure to SO2 were associated with FEV1 in African American children (Neophytou et al. 2016). We investigated whether the effect of the nine prioritized genetic variants associated with lung function varied by SO2 exposure (first year of life, past year, and lifetime exposure). Since the nine variants represent three independent signals (see conditional analysis in the Data availability*** Local institutional review boards approved the studies (IRB# 10-02877). All subjects and legal guardians provided written informed consent. TOPMed whole genome sequencing data are available on dbGaP with the accession number: phs000921.v4.p1. The remaining sequence data are available upon request. Supplemental materials available at figshare. R Section), the Bonferroni-corrected p-value threshold was set to p = 0.0056 (correction for nine tests; three signals and three exposure periods to SO2). We observed a single statistically significant interaction between the T allele of rs58475486 and past year exposure to SO2 that was positively associated with FEV1 (p = 0.003, β = 0.32, Table 3, Figure 3A). Interestingly, six of the remaining eight variants also displayed interaction effects with past year exposure to SO2 that were suggestively associated (p < 0.05) with FEV1 (Table 3). We also found a suggestive interaction of the C allele of rs73440122 with first year exposure to SO2 that was associated with decreased FEV1 (p = 0.045, β = −0.32, Figure 3B). View this table: [Table 3.](http://medrxiv.org/content/early/2020/02/23/2020.02.20.20019588/T3) Table 3. Gene-and-environment analysis on FEV1 ![Figure 3.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/02/23/2020.02.20.20019588/F3.medium.gif) [Figure 3.](http://medrxiv.org/content/early/2020/02/23/2020.02.20.20019588/F3) Figure 3. Gene-by-environment interaction analysis on FEV1. FEV1 residuals, residuals after FEV1 was regressed on the covariates age, sex, height, controller medications, sequencing centers and the first 5 genetic PCs. FEV1 residuals was plotted against (A) past year exposure to SO2 stratified by the number of copies of T allele of rs58475486, (B) first year of life exposure to SO2 stratified by the number of copies of C allele of rs73440122. ## DISCUSSION Variant rs73429450 (MAF = 0.030) was identified as the strongest association signal with FEV1. Each additional copy of the protective A allele of rs73429450 was associated with a 0.3 L increase of FEV1. We did not find any statistically significant contribution of rare variants to the association signal from a 1 Kb sliding window analyses in the 1 MB flanking region centered on rs73429450. We were surprised to identify a novel common variant (MAF = 0.030) associated with lung function using whole genome sequence data in a population that was previously analyzed for associations with lung function using genotype array data. Further investigation revealed that our discovered variant, rs73429450, was not captured by the LAT 1 genotyping array, and the association with lung function depended on the reference panel used to impute the variant into our population. More surprisingly, our statistically significant finding was only found to be suggestively significant using data imputed from the CAAPA reference panel (p = 1.95 × 10−7, β = 0.80). Of the imputation reference panels that we assessed, CAAPA is one of the more relevant reference panels for our study population because it is based on African populations in the Americas. However, we note that the effect size estimated from CAAPA-imputed data was comparable to that generated from WGS data. While whole genome sequencing data is usually praised for enabling analysis of rare-variant contributions to phenotype variability, our results show the utility of whole genome sequencing data for the reliable analysis of common variants as well in the absence of relevant imputation panels. Although rs73429450 had the lowest p-value from our whole genome sequencing association analysis, we did not find the required amount of functional evidence to prioritize this marker for inclusion in downstream gene-by-air-pollution analyses. Another variant, rs73440122, was in moderate to strong linkage disequilibrium (r2 = 0.76) with rs7349450 and had a similar MAF (0.027) in our study population, but was only suggestively associated with FEV1 in our association analysis (p = 2.08 × 10−7, Table2). In contrast to rs7349450, there were multiple lines of evidence suggesting the functional relevance of rs73440122: rs73440122 received the highest priority score, its association with FEV1 replicated in black UK Biobank participants, and it was one of the most likely drivers of FEV1 variability among individuals, possibly mediated through *KITLG*. Bioinformatic interrogation of rs73440122 revealed that the variant overlapped with a ccRE (SCREEN accession EH37E0279310), DNase I hypersensitivity site, and SPI1 ChIP-Seq clusters that were indicative of a candidate open chromatin gene regulatory region. The transcription factor SPI1, which is expressed in lung, spleen and whole blood, was previously linked to fibrotic diseases and tissues (Wohlfahrt et al. 2019) and *KITLG* in erythroid progenitor cells (Quang et al. 1995), but there is no existing evidence of its role in lung function. Although rs73440122 itself is not an eQTL of *KITLG*, it is located in a region that physically interacted with *KITLG* based on Hi-C data in fetal lung fibroblast cells. Additionally, five neighboring FEV1 associated variants were identified as eQTLs of *KITLG*, although they also appeared to be an independent signal (r2 < 0.2). Overall, these results support regulatory interactions between this region and *KITLG*. *KITLG*, a ligand of the KIT tyrosine kinase receptor, is expressed in lung and its decreased expression was previously observed in patients with chronic obstructive pulmonary disease (Bhattacharya et al. 2009). Inactivation of KIT signaling led to airspace enlargement and contributed to declining lung function in mice (Lindsey et al. 2011). Polymorphisms of *KITLG* have been previously associated with susceptibility to moderate-to-severe bronchopulmonary dysplasia, a chronic inflammatory lung disease that affects preterm infants (Huusko et al. 2015). These lines of evidence support the potential contribution of our novel locus to lung function phenotype variation mediated through *KITLG*, especially in patients with inflammatory conditions like asthma. Gene-by-environment interactions likely account for a portion of the “missing” heritability of many complex phenotypes (Moore and Williams. 2009). We hypothesized that a significant portion of the heritability of lung function was due in part to gene-by-air-pollution (SO2) interaction effects. The interaction between rs58475486 and past year exposure to SO2 that was significantly associated with lung function supports our hypothesis. The T allele of rs58475486 is common (8-14%) in African populations and showed a protective effect on lung function in the presence of past year SO2 exposure. SNP rs58475486 is located in a ccRE (SCREEN accession EH37E0279296) and a *FOXA1* binding site in the A549 lung adenocarcinoma cell line. Foxa1 has a known compensatory role with *Foxa2* during lung morphogenesis in mice (Wan et al. 2005). Deletion of both *Foxa1* and Foxa2 inhibited cell proliferation, epithelial cell differentiation, and branching morphogenesis in fetal lung tissue. Further functional validation on the effect of rs58475486 on binding affinity of FOXA1 is necessary to confirm whether the role of *FOXA1* in this ccRE is important for *KITLG* regulatory and lung function. The higher frequency of the protective alleles of both rs73440122 and rs58465486 in African populations appears to contradict previous findings that African ancestry was associated with lower lung function (Kumar et al. 2010). One possible explanation for this seeming inconsistency is that FEV1 is a complex trait whose variation is influenced by many genetic variants of small to moderate effect sizes whose influence on lung function may vary by exposure to environmental factors. We found suggestive evidence that the interaction between rs73440122 and first year exposure to SO2 reverses the positive association of rs73440122 with lung function to a negative one (Table 3). When assessed independently, our genetic association analysis showed that the protective A allele of rs73440122 was associated with higher lung function. However, with increasing levels of SO2 exposure in the first year of life, increasing copies of the A allele of rs73440122 were associated with decreased lung function. Air pollution is known to negatively impact lung function, and we have previously shown that the deleterious effects of air pollution on lung phenotypes may be significantly increased in African American children compared to other populations experiencing the same amount of exposure (Nishimura et al. 2013). It has also been reported that Latino and African American populations often live in neighborhoods with high levels of air pollution (Mott 1995). The increased susceptibility to negative pulmonary effects from air pollution exposure coupled with the disproportionate exposure to air pollution experienced by the African American population may also contribute to the lower lung function seen in this population despite the presence of protective alleles. One limitation of this study is that the FEV1 genetic association and the eQTL analyses with *KITLG* were performed in different populations due to data availability constraints. Although we did not have RNA-Seq data from lung tissues from our study subjects, we previously demonstrated that there is a high degree of overlap in gene expression profiles between nasal and bronchial epithelial cells (Poole et al. 2014). The direction of effect of the association was the same in GALA II Puerto Rican children with asthma but not statistically significant. This may in part due to the significantly lower African Ancestry in Puerto Ricans compared to African Americans. We replicated 20 of 45 variants in black UK Biobank subjects and observed conflicting “flip-flop” associations in African Americans from the SAPPHIRE and GCPD-A studies. In the past, flip-flop associations were deemed as spurious results. Traditional association testing approach studies the effect of each variant on phenotype independently and increases the chance of flip-flop associations detected between studies. Differences in study design, sampling variation that leads to variation in LD patterns, and lack of consideration of other disease influencing genetic and/or environmental factors are all potential causes of flip-flop associations (Kraft et al. 2009; Lin et al. 2007). Hence, it is not surprising to observe flip-flop associations when gene and environment interactions were detected at our FEV1 GWAS locus. It was previously shown that flip-flop associations can occur between and within populations even in the presence of a genuine genetic effect (Kraft et al. 2009; Lin et al. 2007). Further functional analysis is thus required to validate the relationship between the candidate variants, *KITLG* and FEV1. This may include reporter assays to validate potential enhancer or repressor activity and CRISPR-based editing assays to validate the regulatory role of the candidate variants on *KITLG*. Although literature exists describing KIT signaling for lung function in mice (Lindsey et al. 2011), additional knockout experiments in a model animal system may be necessary to confirm the causal effect of *KITLG* on lung function. The average concentration of ambient SO2 exposure in our participants (Table 1) was lower than the National Ambient Air Quality Standards. It is possible that SO2 acted as a surrogate for other unmeasured toxic pollutants emitted from local point sources. Major sources of SO2 in San Francisco Bay Area during the recruitment years of 2006 to 2011 include airports, petroleum refineries, gas and oil plants, calcined petroleum coke plants, electric power plants, cement manufacturing factories, chemical plants, and landfills (United States Environmental Protection Agency 2008; United States Environmental Protection Agency 2011). The Environmental Protection Agency’s national emissions inventory data also showed that these facilities emit Volatile Organic Compounds, heavy metals (lead, mercury, chromium, arsenic), formaldehyde, ethyl benzene, acrolein, 1,3-butadiene, 1,4-dichlorobenzene, and tetrachloroethylene into the air along with SO2. These chemicals are highly toxic and inhaling even a small amount may contribute to poor lung function. Another possibility is that exposure to SO2 captured unmeasured confounding socioeconomic factors. This study identified a novel protective allele for lung function in African American children with asthma. The protective association with lung function intensified with increased past year exposure to SO2. Our findings showcases the complexity of the relationship between genetic and environmental factors impacting variation in FEV1, highlights the utility of WGS data for genetic research of complex phenotypes, and underscores the importance of including diverse study populations in our exploration of the genetic architecture underlying lung function. ## Data Availability Local institutional review boards approved the studies (IRB# 10-02877). All subjects and legal guardians provided written informed consent. TOPMed whole genome sequencing data are available on dbGaP with the accession number: phs000921.v4.p1. The remaining sequence data are available upon request. ## ACKNOWLEDGEMENTS The Genes-Environments and Admixture in Latino Americans (GALA II) Study, the Study of African Americans, Asthma, Genes and Environments (SAGE) Study and E.G.B. were supported by the Sandler Family Foundation, the American Asthma Foundation, the RWJF Amos Medical Faculty Development Program, the Harry Wm. and Diana V. Hind Distinguished Professor in Pharmaceutical Sciences II, the National Heart, Lung, and Blood Institute (NHLBI) [R01HL117004, R01HL128439, R01HL135156, X01HL134589]; the National Institute of Environmental Health Sciences [R01ES015794]; the National Institute on Minority Health and Health Disparities (NIMHD) [P60MD006902, R01MD010443], the National Human Genome Research Institute [U01HG009080] and the Tobacco-Related Disease Research Program [24RT-0025]. MJW was supported by the NHLBI [K01HL140218]. JJ and BEH were supported by the NHLBI [R01HL133433, R01HL141992]. KLK was supported by the NHLBI [R01HL135156-S1], the UCSF Bakar Institute, the Gordon and Betty Moore Foundation [GBMF3834], and the Alfred P. Sloan Foundation [2013-10-27] grant to UC Berkeley through the Moore-Sloan Data Science Environment Initiative. ACW was supported by the Eunice Kennedy Shriver National Institute of Child Health and Human Development [1R01HD085993-01]. The SAPPHIRE study was supported by the Fund for Henry Ford Hospital, the American Asthma Foundation, the NHLBI [R01HL118267, R01HL141485, X01HL134589], the National Institute of Allergy and Infectious Diseases [R01AI079139], and the National Institute of Diabetes and Digestive and Kidney Diseases [R01DK113003]. The GCPD-A study was supported by an Institutional award from the Children’s Hospital of Philadelphia and by the NHLBI [X01HL134589]. Part of this research was conducted using the UK Biobank Resource under Application Number 40375. We would like to thank UK Biobank participants and researchers who contributed or collected data. Whole genome sequencing (WGS) for the Trans-Omics in Precision Medicine (TOPMed) program was supported by the National Heart, Lung and Blood Institute (NHLBI). WGS for “NHLBI TOPMed: Gene-Environment, Admixture and Latino Asthmatics Study” (phs000920) and “NHLBI TOPMed: Study of African Americans, Asthma, Genes and Environments” (phs000921) was performed at the New York Genome Center (3R01HL117004-02S3) and the University of Washington Northwest Genomics Center (HHSN268201600032I). WGS for “NHLBI TOPMed: Study of Asthma Phenotypes & Pharmacogenomic Interactions by Race-Ethnicity” (phs001467) and “Genetics of Complex Pediatric Disorders - Asthma” (phs001661) was performed at the University of Washington Northwest Genomics Center (HHSN268201600032I). Centralized read mapping and genotype calling, along with variant quality metrics and filtering were provided by the TOPMed Informatics Research Center (3R01HL-117626-02S1; contract HHSN268201800002I). Phenotype harmonization, data management, sample-identity QC, and general study coordination were provided by the TOPMed Data Coordinating Center (3R01HL-120393-02S1; contract HHSN268201800001I). We gratefully acknowledge the studies and participants who provided biological samples and data for TOPMed. WGS of part of GALA II was performed by New York Genome Center under The Centers for Common Disease Genomics of the Genome Sequencing Program (GSP) Grant (UM1 HG008901). The GSP Coordinating Center (U24 HG008956) contributed to cross-program scientific initiatives and provided logistical and general study coordination. GSP is funded by the National Human Genome Research Institute, the National Heart, Lung, and Blood Institute, and the National Eye Institute. The TOPMed imputation panel was supported by the NHLBI and TOPMed study investigators who contributed data to the reference panel. The panel was constructed and implemented by the TOPMed Informatics Research Center at the University of Michigan (3R01HL-117626-02S1; contract HHSN268201800002I). The TOPMed Data Coordinating Center (3R01HL-120393-02S1; contract HHSN268201800001I) provided additional data management, sample identity checks, and overall program coordination and support. We gratefully acknowledge the studies and participants who provided biological samples and data for TOPMed. The authors wish to acknowledge the following GALA II and SAGE study collaborators: Shannon Thyne, UCSF; Harold J. Farber, Texas Children’s Hospital; Denise Serebrisky, Jacobi Medical Center; Rajesh Kumar, Lurie Children’s Hospital of Chicago; Emerita Brigino-Buenaventura, Kaiser Permanente; Michael A. LeNoir, Bay Area Pediatrics; Kelley Meade, UCSF Benioff Children’s Hospital, Oakland; William Rodríguez-Cintrón, VA Hospital, Puerto Rico; Pedro C. Ávila, Northwestern University; Jose R. Rodríguez-Santana, Centro de Neumología Pediátrica; Luisa N. Borrell, City University of New York; Adam Davis, UCSF Benioff Children’s Hospital, Oakland; Saunak Sen, University of Tennessee. The authors acknowledge the families and patients for their participation and thank the numerous health care providers and community clinics for their support and participation in GALA II and SAGE. In particular, the authors thank the recruiters who obtained the data: Duanny Alva, MD; Gaby Ayala-Rodríguez; Lisa Caine, RT; Elizabeth Castellanos; Jaime Colón; Denise DeJesus; Blanca López; Brenda López, MD; Louis Martos; Vivian Medina; Juana Olivo; Mario Peralta; Esther Pomares, MD; Jihan Quraishi; Johanna Rodríguez; Shahdad Saeedi; Dean Soto; and Ana Taveras. The authors thank María Pino-Yanes for providing feedback on this study and Thomas W Blackwell for providing critical review on this manuscript. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. * Received February 20, 2020. * Revision received February 20, 2020. * Accepted February 23, 2020. * © 2020, Posted by Cold Spring Harbor Laboratory The copyright holder for this pre-print is the author. All rights reserved. The material may not be redistributed, re-used or adapted without the author's permission. ## LITERATURE CITED 1. 1000 Genomes Project Consortium, A. Auton, L. D. Brooks, R. M. Durbin, E. P. Garrison et al., 2015 A global reference for human genetic variation. Nature 526: 68–74. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nature15393&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26432245&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F02%2F23%2F2020.02.20.20019588.atom) 2. Akinbami, L. J., 2015 Asthma Prevalence, Health Care use and Mortality: United States, 2003-05. [Online] Available at: [http://www.cdc.gov/nchs/data/hestat/asthma03-05/asthma03-05.htm](http://www.cdc.gov/nchs/data/hestat/asthma03-05/asthma03-05.htm). [Accessed 2020 Jan 8]. 3. Akinbami, L. J., J. E. Moorman, A. E. Simon and K. C. Schoendorf, 2014 Trends in racial disparities for asthma outcomes among children 0 to 17 years, 2001-2010. J. Allergy Clin. Immunol. 134: 547-553.e5. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.jaci.2014.05.037&link_type=DOI) 4. Alexander, D. H., J. Novembre and K. Lange, 2009 Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19: 1655–1664. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NjoiZ2Vub21lIjtzOjU6InJlc2lkIjtzOjk6IjE5LzkvMTY1NSI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIwLzAyLzIzLzIwMjAuMDIuMjAuMjAwMTk1ODguYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 5. Anders, S., P. T. Pyl and W. Huber, 2015 HTSeq--a python framework to work with high-throughput sequencing data. Bioinformatics 31: 166–169. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/btu638&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25260700&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F02%2F23%2F2020.02.20.20019588.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000347832300003&link_type=ISI) 6. Ay, F., T. L. Bailey and W. S. Noble, 2014 Statistical confidence estimation for hi-C data reveals regulatory chromatin contacts. Genome Res. 24: 999–1011. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NjoiZ2Vub21lIjtzOjU6InJlc2lkIjtzOjg6IjI0LzYvOTk5IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjAvMDIvMjMvMjAyMC4wMi4yMC4yMDAxOTU4OC5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 7. Barraza-Villarreal, A., J. Sunyer, L. Hernandez-Cadena, M. C. Escamilla-Nunez, J. J. Sienra-Monge et al., 2008 Air pollution, airway inflammation, and lung function in a cohort study of mexico city schoolchildren. Environ. Health Perspect. 116: 832–838. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1289/ehp.10926&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=18560490&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F02%2F23%2F2020.02.20.20019588.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000256254100039&link_type=ISI) 8. Bhattacharya, S., S. Srisuma, D. L. Demeo, S. D. Shapiro, R. Bueno et al., 2009 Molecular biomarkers for quantitative and discrete COPD phenotypes. Am. J. Respir. Cell Mol. Biol. 40: 359–367. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1165/rcmb.2008-0114OC&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=18849563&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F02%2F23%2F2020.02.20.20019588.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000263735700013&link_type=ISI) 9. Brunekreef, B., and S. T. Holgate, 2002 Air pollution and health. Lancet 360: 1233–1242. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S0140-6736(02)11274-8&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=12401268&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F02%2F23%2F2020.02.20.20019588.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000178708100030&link_type=ISI) 10. Buniello, A., J. A. L. MacArthur, M. Cerezo, L. W. Harris, J. Hayhurst et al., 2019 The NHGRI-EBI GWAS catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 47: D1005–D1012. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/nar/gky1120&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=30445434&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F02%2F23%2F2020.02.20.20019588.atom) 11. Bycroft, C., C. Freeman, D. Petkova, G. Band, L. T. Elliott et al., 2018 The UK biobank resource with deep phenotyping and genomic data. Nature 562: 203–209. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41586-018-0579-z&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=30305743&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F02%2F23%2F2020.02.20.20019588.atom) 12. Canela-Xandri, O., K. Rawlik and A. Tenesa, 2018 An atlas of genetic associations in UK biobank. Nat. Genet. 50: 1593–1599. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41588-018-0248-z&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=30349118&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F02%2F23%2F2020.02.20.20019588.atom) 13. Carlson, C. S., T. C. Matise, K. E. North, C. A. Haiman, M. D. Fesinmeyer et al., 2013 Generalization and dilution of association results from european GWAS in populations of non-european ancestry: The PAGE study. PLoS Biol. 11: e1001661. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pbio.1001661&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24068893&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F02%2F23%2F2020.02.20.20019588.atom) 14. Chang, C. C., C. C. Chow, L. C. Tellier, S. Vattikuti, S. M. Purcell et al., 2015 Second-generation PLINK: Rising to the challenge of larger and richer datasets. Gigascience 4: 7–8. eCollection 2015. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/s13742-015-0047-8&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25722852&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F02%2F23%2F2020.02.20.20019588.atom) 15. Chatterjee, S., and N. Das, 1995 Lung function in indian twin children: Comparison of genetic versus environmental influence. Ann. Hum. Biol. 22: 289–303. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1080/03014469500003962&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=8849207&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F02%2F23%2F2020.02.20.20019588.atom) 16. Ciesielski, T. H., S. A. Pendergrass, M. J. White, N. Kodaman, R. S. Sobota et al., 2014 Diverse convergent evidence in the genetic analysis of complex disease: Coordinating omic, informatic, and experimental evidence to better identify and validate risk factors. BioData Min. 7: 10-10. eCollection 2014. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/1756-0381-7-10&link_type=DOI) 17. Conomos, M. P., M. B. Miller and T. A. Thornton, 2015 Robust inference of population structure for ancestry prediction and correction of stratification in the presence of relatedness. Genet. Epidemiol. 39: 276–293. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/gepi.21896&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25810074&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F02%2F23%2F2020.02.20.20019588.atom) 18. Conomos, M. P., A. P. Reiner, B. S. Weir and T. A. Thornton, 2016 Model-free estimation of recent genetic relatedness. Am. J. Hum. Genet. 98: 127–148. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ajhg.2015.11.022&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26748516&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F02%2F23%2F2020.02.20.20019588.atom) 19. Das, S., L. Forer, S. Schonherr, C. Sidore, A. E. Locke et al., 2016 Next-generation genotype imputation service and methods. Nat. Genet. 48: 1284–1287. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.3656&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27571263&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F02%2F23%2F2020.02.20.20019588.atom) 20. Duggal, P., E. M. Gillanders, T. N. Holmes and J. E. Bailey-Wilson, 2008 Establishing an adjusted p-value threshold to control the family-wide type 1 error in genome wide association studies. BMC Genomics 9: 516–516. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/1471-2164-9-516&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=18976480&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F02%2F23%2F2020.02.20.20019588.atom) 21. ENCODE Project Consortium, 2012 An integrated encyclopedia of DNA elements in the human genome. Nature 489: 57–74. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nature11247&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22955616&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F02%2F23%2F2020.02.20.20019588.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000308347000039&link_type=ISI) 22. ENCODE Project Consortium, 2011 A user’s guide to the encyclopedia of DNA elements (ENCODE). PLoS Biol. 9: e1001046. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pbio.1001046&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21526222&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F02%2F23%2F2020.02.20.20019588.atom) 23. GTEx Consortium, Laboratory, Data Analysis &Coordinating Center (LDACC)-Analysis Working Group, Statistical Methods groups-Analysis Working Group, Enhancing GTEx (eGTEx) groups, NIH Common Fund et al, 2017 Genetic effects on gene expression across human tissues. Nature 550: 204–213. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nature24277&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=29022597&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F02%2F23%2F2020.02.20.20019588.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000412829500039&link_type=ISI) 24. Hankinson, J. L., J. R. Odencrantz and K. B. Fedan, 1999 Spirometric reference values from a sample of the general U.S. population. Am. J. Respir. Crit. Care Med. 159: 179–187. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1164/ajrccm.159.1.9712108&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=9872837&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F02%2F23%2F2020.02.20.20019588.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000077987600027&link_type=ISI) 25. Hinrichs, A. S., D. Karolchik, R. Baertsch, G. P. Barber, G. Bejerano et al., 2006 The UCSC genome browser database: Update 2006. Nucleic Acids Res. 34: 590. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/nar/gkj144&link_type=DOI) 26. Hukkinen, M., J. Kaprio, U. Broms, A. Viljanen, D. Kotz et al., 2011 Heritability of lung function: A twin study among never-smoking elderly women. Twin Res. Hum. Genet. 14: 401–407. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1375/twin.14.5.401&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21962131&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F02%2F23%2F2020.02.20.20019588.atom) 27. Huusko, J. M., M. Mahlman, M. K. Karjalainen, T. Kaukola, R. Haataja et al., 2015 Polymorphisms of the gene encoding kit ligand are associated with bronchopulmonary dysplasia. Pediatr. Pulmonol. 50: 260–270. 28. Ierodiakonou, D., A. Zanobetti, B. A. Coull, S. Melly, D. S. Postma et al., 2016 Ambient air pollution, lung function, and airway responsiveness in asthmatic children. J. Allergy Clin. Immunol. 137: 390–399. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.jaci.2015.05.028&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26187234&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F02%2F23%2F2020.02.20.20019588.atom) 29. Ionita-Laza, I., S. Lee, V. Makarov, J. D. Buxbaum and X. Lin, 2013 Sequence kernel association tests for the combined effect of rare and common variants. Am. J. Hum. Genet. 92: 841–853. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ajhg.2013.04.015&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23684009&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F02%2F23%2F2020.02.20.20019588.atom) 30. Jiang, H., R. Lei, S. W. Ding and S. Zhu, 2014 Skewer: A fast and accurate adapter trimmer for next-generation sequencing paired-end reads. BMC Bioinformatics 15: 182–182. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/1471-2105-15-182&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24925680&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F02%2F23%2F2020.02.20.20019588.atom) 31. Johnson, J. D., and W. M. Theurer, 2014 A stepwise approach to the interpretation of pulmonary function tests. Am. Fam. Physician 89: 359–366. 32. Karolchik, D., A. S. Hinrichs, T. S. Furey, K. M. Roskin, C. W. Sugnet et al., 2004 The UCSC table browser data retrieval tool. Nucleic Acids Res. 32: 493. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/nar/gkh103&link_type=DOI) 33. Kim, D., B. Langmead and S. L. Salzberg, 2015 HISAT: A fast spliced aligner with low memory requirements. Nat. Methods 12: 357–360. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nmeth.3317&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25751142&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F02%2F23%2F2020.02.20.20019588.atom) 34. Kim, M. S., K. P. Patel, A. K. Teng, A. J. Berens and J. Lachance, 2018 Genetic disease risks can be misestimated across global populations. Genome Biol. 19: 179–7. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/s13059-018-1561-7&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F02%2F23%2F2020.02.20.20019588.atom) 35. Kraft, P., E. Zeggini and J. P. Ioannidis, 2009 Replication in genome-wide association studies. Stat. Sci. 24: 561–573. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1214/09-STS290&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20454541&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F02%2F23%2F2020.02.20.20019588.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000277257000013&link_type=ISI) 36. Kumar, R., M. A. Seibold, M. C. Aldrich, L. K. Williams, A. P. Reiner et al., 2010 Genetic ancestry in lung-function predictions. N. Engl. J. Med. 363: 321–330. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1056/NEJMoa0907897&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20647190&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F02%2F23%2F2020.02.20.20019588.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000280139300005&link_type=ISI) 37. Lee, S., M. J. Emond, M. J. Bamshad, K. C. Barnes, M. J. Rieder et al., 2012 Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies. Am. J. Hum. Genet. 91: 224–237. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ajhg.2012.06.007&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22863193&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F02%2F23%2F2020.02.20.20019588.atom) 38. Levin, A. M., Y. Wang, K. E. Wells, B. Padhukasahasram, J. J. Yang et al., 2014 Nocturnal asthma and the importance of race/ethnicity and genetic ancestry. Am. J. Respir. Crit. Care Med. 190: 266–273. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24937318&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F02%2F23%2F2020.02.20.20019588.atom) 39. Li, X., G. A. Hawkins, E. J. Ampleford, W. C. Moore, H. Li et al., 2013 Genome-wide association study identifies TH1 pathway genes associated with lung function in asthmatic patients. J. Allergy Clin. Immunol. 132: 313-20.e15. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.jaci.2013.01.051&link_type=DOI) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000322631700007&link_type=ISI) 40. Liao, S. Y., X. Lin and D. C. Christiani, 2014 Genome-wide association and network analysis of lung function in the framingham heart study. Genet. Epidemiol. 38: 572–578. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/gepi.21841&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25044411&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F02%2F23%2F2020.02.20.20019588.atom) 41. Lin, P. I., J. M. Vance, M. A. Pericak-Vance and E. R. Martin, 2007 No gene is an island: The flip-flop phenomenon. Am. J. Hum. Genet. 80: 531–538. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1086/512133&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=17273975&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F02%2F23%2F2020.02.20.20019588.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000244403300015&link_type=ISI) 42. Lindsey, J. Y., K. Ganguly, D. M. Brass, Z. Li, E. N. Potts et al., 2011 C-kit is essential for alveolar maintenance and protection from emphysema-like disease in mice. Am. J. Respir. Crit. Care Med. 183: 1644–1652. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1164/rccm.201007-1157OC&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21471107&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F02%2F23%2F2020.02.20.20019588.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000292305600015&link_type=ISI) 43. Liu, X., S. White, B. Peng, A. D. Johnson, J. A. Brody et al., 2016 WGSA: An annotation pipeline for human genome sequencing studies. J. Med. Genet. 53: 111–112. [FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiRlVMTCI7czoxMToiam91cm5hbENvZGUiO3M6OToiam1lZGdlbmV0IjtzOjU6InJlc2lkIjtzOjg6IjUzLzIvMTExIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjAvMDIvMjMvMjAyMC4wMi4yMC4yMDAxOTU4OC5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 44. Mak, A. C. Y., M. J. White, W. L. Eckalbar, Z. A. Szpiech, S. S. Oh et al., 2018 Whole-genome sequencing of pharmacogenetic drug response in racially diverse children with asthma. Am. J. Respir. Crit. Care Med. 197: 1552–1564. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1164/rccm.201712-2529OC&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=29509491&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F02%2F23%2F2020.02.20.20019588.atom) 45. Mak, A. C., M. J. White, C. Eng, D. Hu, S. Huntsman et al., 2016 Whole Genome Sequencing to Identify Genetic Variation Associated with Bronchodilator Response in Minority Children with Asthma. 46. Martin, A. R., C. R. Gignoux, R. K. Walters, G. L. Wojcik, B. M. Neale et al., 2017 Human demographic history impacts genetic risk prediction across diverse populations. Am. J. Hum. Genet. 100: 635–649. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ajhg.2017.03.004&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=28366442&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F02%2F23%2F2020.02.20.20019588.atom) 47. Martin, J. S., Z. Xu, A. P. Reiner, K. L. Mohlke, P. Sullivan et al., 2017 HUGIn: Hi-C unifying genomic interrogator. Bioinformatics 33: 3793–3795. 48. Moore, J. H., 2005 A global view of epistasis. Nat. Genet. 37: 13–14. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng0105-13&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=15624016&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F02%2F23%2F2020.02.20.20019588.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000225997500009&link_type=ISI) 49. Moore, J. H., and S. M. Williams, 2009 Epistasis and its implications for personal genetics. Am. J. Hum. Genet. 85: 309–320. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ajhg.2009.08.006&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=19733727&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F02%2F23%2F2020.02.20.20019588.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000270104500001&link_type=ISI) 50. Mott, L., 1995 The disproportionate impact of environmental health threats on children of color. Environ. Health Perspect. 103 Suppl 6: 33–35. 51. Neophytou, A. M., M. J. White, S. S. Oh, N. Thakur, J. M. Galanter et al., 2016 Air pollution and lung function in minority youth with asthma in the GALA II (genes-environments and admixture in latino americans) and SAGE II (study of african americans, asthma, genes, and environments) studies. Am. J. Respir. Crit. Care Med. 193: 1271–1280. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1164/rccm.201508-1706OC&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26734713&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F02%2F23%2F2020.02.20.20019588.atom) 52. Nishimura, K. K., J. M. Galanter, L. A. Roth, S. S. Oh, N. Thakur et al., 2013 Early-life air pollution and asthma risk in minority children. the GALA II and SAGE II studies. Am. J. Respir. Crit. Care Med. 188: 309–318. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1164/rccm.201302-0264OC&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23750510&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F02%2F23%2F2020.02.20.20019588.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000322617800012&link_type=ISI) 53. Oh, S. S., M. J. White, C. R. Gignoux and E. G. Burchard, 2016 Making precision medicine socially precise. take a deep breath. Am. J. Respir. Crit. Care Med. 193: 348–350. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1164/rccm.201510-2045ED&link_type=DOI) 54. Oh, S. S., H. Tcheurekdjian, L. A. Roth, E. A. Nguyen, S. Sen et al., 2012 Effect of secondhand smoke on asthma control among black and latino children. J. Allergy Clin. Immunol. 129: 1478-83.e7. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.jaci.2012.03.017&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22552109&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F02%2F23%2F2020.02.20.20019588.atom) 55. Ong, B. A., J. Li, J. M. McDonough, Z. Wei, C. Kim et al., 2013 Gene network analysis in a pediatric cohort identifies novel lung function genes. PLoS One 8: e72899. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pone.0072899&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24023788&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F02%2F23%2F2020.02.20.20019588.atom) 56. Palmer, L. J., M. W. Knuiman, M. L. Divitini, P. R. Burton, A. L. James et al., 2001 Familial aggregation and heritability of adult lung function: Results from the busselton health study. Eur. Respir. J. 17: 696–702. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiZXJqIjtzOjU6InJlc2lkIjtzOjg6IjE3LzQvNjk2IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjAvMDIvMjMvMjAyMC4wMi4yMC4yMDAxOTU4OC5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 57. Pe’er, I., R. Yelensky, D. Altshuler and M. J. Daly, 2008 Estimation of the multiple testing burden for genomewide association studies of nearly all common variants. Genet. Epidemiol. 32: 381–385. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/gepi.20303&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=18348202&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F02%2F23%2F2020.02.20.20019588.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000255471100009&link_type=ISI) 58. Pino-Yanes, M., N. Thakur, C. R. Gignoux, J. M. Galanter, L. A. Roth et al., 2015 Genetic ancestry influences asthma susceptibility and lung function among latinos. J. Allergy Clin. Immunol. 135: 228–235. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.jaci.2014.07.053&link_type=DOI) 59. Plummer, M., N. Best, K. Cowles and K. Vines, 2006 CODA: Convergence diagnosis and output analysis for MCMC. R News 6: 7–11. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1159/000323281&link_type=DOI) 60. Poole, A., C. Urbanek, C. Eng, J. Schageman, S. Jacobson et al., 2014 Dissecting childhood asthma with nasal transcriptomics distinguishes subphenotypes of disease. J. Allergy Clin. Immunol. 133: 670-8.e12. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.jaci.2013.11.025&link_type=DOI) 61. Pruim, R. J., R. P. Welch, S. Sanna, T. M. Teslovich, P. S. Chines et al., 2010 LocusZoom: Regional visualization of genome-wide association scan results. Bioinformatics 26: 2336–2337. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/btq419&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20634204&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F02%2F23%2F2020.02.20.20019588.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000281714100054&link_type=ISI) 62. Purcell, S., and C. Chang, 2013 Plink 1.9. [Online] Available at: [www.cog-genomics.org/plink/1.9/](http://www.cog-genomics.org/plink/1.9/). [Accessed 2019 Mar]. 63. Quang, C. T., M. Pironin, M. von Lindern, H. Beug and J. Ghysdael, 1995 Spi-1 and mutant p53 regulate different aspects of the proliferation and differentiation control of primary erythroid progenitors. Oncogene 11: 1229–1239. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=7478542&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F02%2F23%2F2020.02.20.20019588.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1995RY96700002&link_type=ISI) 64. Quinlan, A. R., and I. M. Hall, 2010 BEDTools: A flexible suite of utilities for comparing genomic features. Bioinformatics 26: 841–842. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/btq033&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20110278&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F02%2F23%2F2020.02.20.20019588.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000275243500019&link_type=ISI) 65. Repapi, E., I. Sayers, L. V. Wain, P. R. Burton, T. Johnson et al., 2010 Genome-wide association study identifies five loci associated with lung function. Nat. Genet. 42: 36–44. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.501&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20010834&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F02%2F23%2F2020.02.20.20019588.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000273055100014&link_type=ISI) 66. Schmitt, A. D., M. Hu, I. Jung, Z. Xu, Y. Qiu et al., 2016 A compendium of chromatin contact maps reveals spatially active regions in the human genome. Cell. Rep. 17: 2042–2059. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.celrep.2016.10.061&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27851967&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F02%2F23%2F2020.02.20.20019588.atom) 67. Sillanpaa, E., S. Sipila, T. Tormakangas, J. Kaprio and T. Rantanen, 2017 Genetic and environmental effects on telomere length and lung function: A twin study. J. Gerontol. A Biol. Sci. Med. Sci. 72: 1561–1568. 68. Sofer, T., X. Zheng, S. M. Gogarten, C. A. Laurie, K. Grinde et al., 2019 A fully adjusted two-stage procedure for rank-normalization in genetic association studies. Genet. Epidemiol. 43: 263–275. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/gepi.22188&link_type=DOI) 69. Soler Artigas, M., L. V. Wain, S. Miller, A. K. Kheirallah, J. E. Huffman et al., 2015 Sixteen new lung function signals identified through 1000 genomes project reference panel imputation. Nat. Commun. 6: 8658. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ncomms9658&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26635082&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F02%2F23%2F2020.02.20.20019588.atom) 70. Soler Artigas, M., D. W. Loth, L. V. Wain, S. A. Gharib, M. Obeidat et al., 2011 Genome-wide association and large-scale follow up identifies 16 new loci influencing lung function. Nat. Genet. 43: 1082–1090. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.941&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21946350&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F02%2F23%2F2020.02.20.20019588.atom) 71. Summer Institute in Statistical Genetics, 2019 PC-Relate. [Online] Available at: [https://uw-gac.github.io/SISG\_2019/pc-relate.html](https://uw-gac.github.io/SISG_2019/pc-relate.html). [Accessed 2019 Jul 25]. 72. Taliun, D., D. N. Harris, M. D. Kessler, J. Carlson, Z. A. Szpiech et al., 2019 Sequencing of 53,831 diverse genomes from the NHLBI TOPMed program. bioRxiv 563866. 73. Tian, X., C. Xu, Y. Wu, J. Sun, H. Duan et al., 2017 Genetic and environmental influences on pulmonary function and muscle strength: The chinese twin study of aging. Twin Res. Hum. Genet. 20: 53–59. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1017/thg.2016.97&link_type=DOI) 74. TOPMed, 2019 TOPMed Whole Gneome Sequencing Methods: Freeze 8. [Online] Available at: [https://www.nhlbiwgs.org/topmed-whole-genome-sequencing-methods-freeze-8](https://www.nhlbiwgs.org/topmed-whole-genome-sequencing-methods-freeze-8). [Accessed 2019 Dec 13]. 75. United States Environmental Protection Agency, 2011 National Emissions Inventory (NEI) 2011 Data. [Online] Available at: [https://www.epa.gov/air-emissions-inventories/2011-national-emissions-inventory-nei-data](https://www.epa.gov/air-emissions-inventories/2011-national-emissions-inventory-nei-data). [Accessed 2020 Jan 8]. 76. United States Environmental Protection Agency, 2008 National Emissions Inventory (NEI) 2008 Data. [Online] Available at: [https://www.epa.gov/air-emissions-inventories/2008-national-emissions-inventory-nei-data](https://www.epa.gov/air-emissions-inventories/2008-national-emissions-inventory-nei-data). [Accessed 2020 Jan 8]. 77. University of Michigan, and NHLBI TOPMed, 2018 BRAVO Variant Browser. [Online] Available at: [https://bravo.sph.umich.edu/freeze5/hg38/](https://bravo.sph.umich.edu/freeze5/hg38/). [Accessed 2019 Aug]. 78. Wain, L. V., N. Shrine, M. S. Artigas, A. M. Erzurumluoglu, B. Noyvert et al., 2017 Genome-wide association analyses for lung function and chronic obstructive pulmonary disease identify new loci and potential druggable targets. Nat. Genet. 49: 416–425. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.3787&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=28166213&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F02%2F23%2F2020.02.20.20019588.atom) 79. Wan, H., S. Dingle, Y. Xu, V. Besnard, K. H. Kaestner et al., 2005 Compensatory roles of Foxa1 and Foxa2 during lung morphogenesis. J. Biol. Chem. 280: 13809–13816. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiamJjIjtzOjU6InJlc2lkIjtzOjEyOiIyODAvMTQvMTM4MDkiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMC8wMi8yMy8yMDIwLjAyLjIwLjIwMDE5NTg4LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 80. White, M. J., O. Risse-Adams, P. Goddard, M. G. Contreras, J. Adams et al., 2016 Novel genetic risk factors for asthma in african american children: Precision medicine and the SAGE II study. Immunogenetics 68: 391–400. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/s00251-016-0914-1&link_type=DOI) 81. Wise, J., 2019 Air pollution is linked to infant deaths and reduced lung function in children. BMJ 366: 5772. 82. Wohlfahrt, T., S. Rauber, S. Uebe, M. Luber, A. Soare et al., 2019 PU.1 controls fibroblast polarization and tissue fibrosis. Nature 566: 344–349. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41586-019-0896-x&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F02%2F23%2F2020.02.20.20019588.atom) 83. Wojcik, G. L., M. Graff, K. K. Nishimura, R. Tao, J. Haessler et al., 2019 Genetic analyses of diverse populations improves discovery for complex traits. Nature 570: 514–518. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41586-019-1310-4&link_type=DOI) 84. World Health Organization, 2017 Asthma. [Online] Available at: [http://www.who.int/mediacentre/factsheets/fs307/en/](http://www.who.int/mediacentre/factsheets/fs307/en/). [Accessed 2020 Jan 8]. 85. Yamada, H., Y. Yatagai, H. Masuko, T. Sakamoto, H. Iijima et al., 2015 Heritability of pulmonary function estimated from genome-wide SNPs in healthy japanese adults. Respir. Investig. 53: 60–67. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.resinv.2014.10.004&link_type=DOI) 86. Zhang, F., and J. R. Lupski, 2015 Non-coding genetic variants in human disease. Hum. Mol. Genet. 24: 102.