GWAS meta-analysis and gene expression data link reproductive tract development, immune response and cellular proliferation/apoptosis with cervical cancer and clarify overlap with other cervical phenotypes

Background The uterine cervix has an important role in female reproductive health, but not much is known about the genetic determinants of cervical biology and pathology. Genome-wide association studies (GWAS) with increasing sample sizes have reported a few genetic associations for cervical cancer. However, GWAS is only the first step in mapping the genetic susceptibility and thus, the underlying biology in cervical cancer and other cervical phenotypes is still not entirely understood. Here, we use data from large biobanks to characterise the genetics of cervical phenotypes (including cervical cancer) and leverage latest computational methods and gene expression data to refine the association signals for cervical cancer. Methods Using Estonian Biobank and FinnGen data, we characterise the genetic signals associated with cervical ectropion (10,162 cases/151,347 controls), cervicitis (19,285/185,708) and cervical dysplasia (14,694/150,563). We present the results from the largest trans-ethnic GWAS meta-analysis of cervical cancer, including up to 9,229 cases and 490,304 controls from Estonian Biobank, the FinnGen study, the UK Biobank and Biobank Japan. We combine GWAS results with gene expression data and chromatin regulatory annotations in HeLa cervical carcinoma cells to propose the most likely candidate genes and causal variants for every locus associated with cervical cancer. We further dissect the HLA association with cervical pathology using imputed data on alleles and amino acid polymorphisms. Results We report a single associated locus on 2q13 for both cervical ectropion (rs3748916, p=5.1 x 10-16) and cervicitis (rs1049137, p=3.9 x 10-10), and five signals for cervical dysplasia - 6p21.32 (rs1053726, p=9.1 x 10-9; rs36214159, 1.6 x 10-22), 2q24.1 (rs12611652, p=3.2 x 10-9) near DAPL1, 2q13 ns1049137, p=6.4 x 10-9) near PAX8, and 5p15.33 (rs6866294, p=2.1 x 10-9), downstream of CLPTM1L. We identify five loci associated with cervical cancer in the trans-ethnic meta-analysis: 1p36.12 (rs2268177, p= 3.1 x 10-8), 2q13 (rs4849177, p=9.4 x 10-15), 5p15.33 (rs27069, p=1.3 x 10-14), 17q12 (rs12603332, p=1.2 x 10-9), and 6p21.32 (rs35508382, p=1.0 x 10-39). Joint analysis of dysplasia and cancer datasets revealed an association on chromosome 19 (rs425787, p=3.5 x 10-8), near CD70. Conclusions Our results map PAX8/PAX8-AS1, LINC00339, CDC42, CLPTM1L, HLA-DRB1, HLA-B, and GSDMB as the most likely candidate genes for cervical cancer, which provides novel insight into cervical cancer pathogenesis and supports the role of genes involved in reproductive tract development, immune response and cellular proliferation/apoptosis. We further show that PAX8/PAX8-AS1 has a central role in cervical biology and pathology, as it was associated with all analysed phenotypes. The detailed characterisation of association signals, together with mapping of causal variants and genes offers valuable leads for further functional studies.

and cervical dysplasia (14,694/150,563). We present the results from the largest trans-ethnic GWAS meta-analysis of cervical cancer, including up to 9,229 cases and 490,304 controls from Estonian Biobank, the FinnGen study, the UK Biobank and Biobank Japan. We combine GWAS results with gene expression data and chromatin regulatory annotations in HeLa cervical carcinoma cells to propose the most likely candidate genes and causal variants for every locus associated with cervical cancer. We further dissect the HLA association with cervical pathology using imputed data on alleles and amino acid polymorphisms.

Results
We report a single associated locus on 2q13 for both cervical ectropion (rs3748916, p=5.1 x 10 -16 ) and cervicitis (rs1049137, p=3.9 x 10 -10 ), and five signals for cervical dysplasia - and GSDMB as the most likely candidate genes for cervical cancer, which provides novel insight into cervical cancer pathogenesis and supports the role of genes involved in reproductive tract development, immune response, and cellular proliferation/apoptosis. We further show that PAX8/PAX8-AS1 has a central role in cervical biology and pathology, as it was associated with all analysed phenotypes. The detailed characterisation of association signals, together with mapping of causal variants and genes offers valuable leads for further functional studies.

Introduction
The uterine cervix has an important role in female reproductive health, as it separates the lower and upper parts of the genital tract and thus forms a barrier for pathogens (1) which can cause infection of the upper genital tract. The latter can lead to serious health problems, including pelvic inflammatory disease, infertility, and chronic pelvic pain (2).
The cervical epithelium is also the main infection site for human papillomavirus (HPV), one of the most common causes of sexually transmitted diseases, which can, in turn, cause cervical dysplasia and in some cases malignant neoplasm of the cervix. Cervical cancer is one of the most common cancer types in women, with more than 28 000 and 311 000 women dying from the disease in the Europe and worldwide every year, respectively (3).
Although the development of cervical cancer is initiated by high-risk human papillomavirus (HPV) subtype infection, it also involves the contribution from the genetics of host, which determine whether the infection is successfully cleared or persists and eventually develops into cervical cancer, and also the rate at which the tumor progresses.
Previous family-based studies have estimated the heritability of cervical cancer to be 13-64% (4-6) (array-based heritability estimate 7% (7)), and recent-genome wide association studies (GWAS) with increasing sample sizes have also increased the number of loci reported for cervical cancer (7,8). However, GWAS findings are merely the first step in mapping the genetic susceptibility and biology, and thus, the underlying carcinogenic mechanisms and molecular changes in cervical cancer are still not entirely understood (9).
At the same time, not much is known about the genetic factors predisposing to other cervical phenotypes, such as cervical ectropion (a benign condition where the columnar epithelium of the cervical canal is turned outwards and exposed to the vaginal environment (10)), cervicitis (inflammation of the uterine cervical epithelium, most commonly caused by sexually transmitted pathogens, such as Chlamydia trachomatis, Neisseria gonorrhoea and Mycoplasma genitalium (11)) and cervical dysplasia (a precancerous condition with varying severity, characterized by abnormal growth of the cervical epithelium), which all represent partially overlapping conditions related to the . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 27, 2021. ; https://doi.org/10.1101/2021.06.18.21259075 doi: medRxiv preprint cervix. Without knowing the genetic determinants of cervical biology, it is difficult to put the findings from cervical cancer genetic studies into the biological perspective.
Here we use data from Estonian Biobank and the FinnGen study to dissect the genetic architecture of cervical phenotypes in a sample set including cases for cervical ectropion (n=10,162), cervicitis (n=19,285), and cervical dysplasia (n=14,694). We then explore their genetic overlap with cervical cancer by combining all publicly available datasets in the largest trans-ethnic GWAS meta-analysis of cervical cancer to date, with 9,229 cervical cancer cases and 490,304 controls. Leveraging latest computational methods and gene expression data, we refine the association signals for cervical cancer and propose potential causal variants and genes at each locus for functional follow-up.

Study design and participants
We used data from the Estonian Biobank (EstBB) and the FinnGen study (release 5) for GWAS of cervical phenotypes ectropion, cervicitis, and dysplasia. The EstBB is a population-based biobank with genotype data and health information for over 200,000 participants (12). Information on International Classification of Disease-10 (ICD10) codes is obtained via regular linking with the Health Insurance Fund and other relevant registries (13). The 150K data freeze was used for the analyses described in this paper (n=92,042 women). All biobank participants have signed a broad informed consent for using their data in research and the study was carried out under ethical approval 1.1-12/624 from the Estonian Committee on Bioethics and Human Research (Estonian Ministry of Social Affairs) and data release N05 from the EstBB.
The following ICD10 codes were used for extracting cases: N86 (Erosion and ectropion of cervix uteri), N72 (Inflammatory disease of cervix uteri), N87 (Dysplasia of cervix uteri), C53/D06 (Cervical cancer). Women who did not have the respective ICD codes were used as controls. The final numbers in analysis were as follows: cervical ectropion: 9,664 cases, 82,378 controls; cervicitis: 18,192  OR-s to betas (beta=log(OR)), then derived z-scores from reported p-values (using the 'qnorm' function in R) and calculated SE-s (SE=beta/z-score).

GWAS meta-analysis
All European ancestry meta-analyses were conducted using inverse variance weighted fixed-effect meta-analysis method implemented into GWAMA software (18). For cervical cancer meta-analysis including data from Biobank Japan, we used MR-MEGA, which is a tool for trans-ethnic meta-regression (19). Genome-wide significance was set at p < 5 × 10 -8 in all analyses. We used MTAG (20) (Multi-Trait Analysis of GWAS) to jointly analyse the summary statistics from dysplasia and cervical cancer European ancestry analyses and thus increase the power to detect additional associations.
For annotation and follow-up analyses, we used the GWAS summary statistics (European ancestry for cervical cancer).

Annotation of GWAS signals
We used FUMA (21) for functional annotation of GWAS results and credible set variants.
For functional annotation, the Annotate Variation (ANNOVAR) (22), CADD (a continuous score showing how deleterious the SNP is to protein structure/function; scores >12.37 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 27, 2021. ; https://doi.org/10.1101/2021.06.18.21259075 doi: medRxiv preprint indicate potential pathogenicity) (23) and RegulomeDB (24) scores (ranging from 1 to 7, where lower score indicates greater evidence for having regulatory function), and 15 chromatin states from the Roadmap Epigenomics Project (25) were used. FUMA also performs lookups in the GWAS Catalog, the results of which are shown in Supplementary   Table 7.

HLA analysis
For cervical dysplasia meta-analysis, we carried out HLA imputation of the EstBB genotype data with the SNP2HLA tool (26). As an imputation reference, we used a merged reference of EstBB WGS (15) and Type 1 Diabetes Genetics Consortium samples (26). We tested for association between the alleles, amino acid polymorphisms and cervical dysplasia in the EstBB using SAIGE with the LOCO option. We used imputed data on alleles (two-and four-digit) and amino acid polymorphisms present in the MHC

Colocalisation and fine-mapping analyses
We used HyPrColoc (27), a fast and efficient colocalisation method for identifying the overlap between our GWAS meta-analysis signals and cis-QTL signals from different tissues and cell types (expression QTLs, transcript QTLs, exon QTLs and exon usage QTLs available in the eQTL Catalogue (28)). We lifted the GWAS summary statistics over to hg38 build to match the eQTL Catalogue using binary liftOver tool (https://genome.sph.umich.edu/wiki/LiftOver#Binary_liftOver_tool). For each genomewide significant (p<5 × 10-8) GWAS locus we extracted the +/-500kb of its top hit from QTL datasets and ran the colocalization analysis against eQTL Catalogue traits. For each eQTL Catalogue dataset we included all the QTL features which shared at least 80% of tested variants with the variants present in our GWAS region. We used the default settings for HyPrColoc analyses and did not specify any sample overlap argument, because . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 27, 2021. ; https://doi.org/10.1101/2021.06.18.21259075 doi: medRxiv preprint HyPrColoc paper (27) demonstrates that assuming trait independence gives reasonable results. HyPrColoc outputs the following results a) a cluster of putatively colocalised traits (here our GWAS region of interest and cis-QTL signal for any nearby feature for given QTL dataset); b) the posterior probability that genetic association signals for those traits are colocalising (we considered two or more signals to colocalize if the posterior probability for a shared causal variant (PP4) was 0.8 or higher. All results with a PP4 > 0.8 can be found in Supplementary Table 4); c) the 'regional association' probability (a large regional association probability indicates that one or more SNPs in the region have shared association across evaluated traits); d) a candidate causal variant explaining the shared association; and e) the proportion of the posterior probability explained by this variant (which also represents the HyPrColoc multi-trait fine-mapping probability). For every colocalisation event, we also calculated 95% credible set (CS) for multi-trait finemapping results. To do so, we ranked all variants decreasingly based on their posterior probability and extracted top n variants with cumulative posterior probability of ≥0.95.
Since cervical samples were not present in analysed gene expression datasets, we prioritised colocalisation signals from tissues that cluster together with vagina/uterus in GTEx V8 data, either based on cell-type-composition or gene expression (Supplementary figures S41 and S48 of (29)) and that have sufficient sample size to detect QTL associations. These tissues include vagina, uterus, esophagus mucosa and gastro-esophageal junction, sigmoid colon, skin, salivary gland, and tibial nerve. Of these 'proxy' tissues, esophageal mucosa (stratified squamous epithelium) and gastroesophageal junction (transition zone between stratified and columnar epithelium) tissues are histologically most similar to the cervix.

Genetic correlations
We used the LD Score regression method (30) implemented in LD Hub (31)(http://ldsc.broadinstitute.org) for testing genetic correlations between cervical cancer . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 27, 2021. ; https://doi.org/10.1101/2021.06.18.21259075 doi: medRxiv preprint and traits spanning reproductive, aging, autoimmune, cancer, and smoking behaviour categories (33 traits in total), using the cervical cancer European-ancestry only GWAS meta-analysis summary statistics and data available within the LD Hub resource. After filtering the input to HapMap3 SNPs, removing SNPs within the MHC region, and merging with the built-in reference panel LD Scores (1000 Genomes EUR ancestry) (31), ~1.1M variants remained for analysis. False discovery rate (FDR) correction (calculated using the p.adjust function in R) was used to account for multiple testing. Results of the analysis are presented in Supplementary Table 9.
LDSC estimated observed scale heritability (0.0059 (se=0.0013)) for cervical cancer was converted to liability scale using the formula h 2 liability = h 2 observed × K 2 × (1 -K) 2 / P / (1-P) / zv 2 , where K is the population prevalence (here equal to sample prevalence) and P is the proportion of cases in the study (European ancestry analysis, 2.1%). This resulted in a liability scale heritability estimate of 4.75% for non-HLA common variant heritability.

Data sharing
GWAS meta-analysis summary statistics will be made available upon publication.
Datasets used in this study can be downloaded from provided web pages.

Role of the funding source
The funders had no role in the study design; in the collection, analysis, or interpretation of data; in the writing of the report; or in the decision to submit the paper for publication.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 27, 2021.

Results
First, to determine the genetic factors associated with cervical phenotypes, we conducted GWAS using 92,042 Estonian female individuals from the EstBB for cervical ectropion, cervicitis, and dysplasia. Next, the results of these analyses were further meta-analysed together with the corresponding summary statistics from the FinnGen study (R5 release
Notably, all three analysed phenotypes showed significant association with a locus on chromosome 2 near PAX8 gene, a transcription factor known to be relevant for genital tract development and its antisense RNA PAX8-AS1. For cervical ectropion and cervicitis, the lead signal (genetic variant with the lowest p-value in a region) was nonsignificant in the FinnGen data alone. However, the effect size estimates (odds ratios and 95% confidence intervals) are concordant and lack of significance in this particular dataset can be attributed to very small sample size -498 and 1,093 cases for ectropion and cervicitis, respectively. Furthermore, we observed additional four genome-wide significant signals for cervical dysplasia -two in the HLA region on chromosome 6 (rs1053726, rs36214159), one on chromosome 2 (rs112611652, p=3.2 × 10 -9 ) near DAPL1 and one on chromosome 5 (rs6866294, p=2.1 × 10 -9 ), downstream CLPTM1L.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 27, 2021. ; cervical cancer cases together with 490,304 controls in transethnic meta-analysis.
As a results, we identified five loci robustly associated with cervical cancer ( Table   2 Table 6).
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 27, 2021.   Table 4). CDC42-AS1 and LINC00339 were also mapped by colocalisation signal in relevant tissue (Figure 1 Table   6, Supplementary Figure 3). LINC00339 has a known role in promoting the proliferation of several cancers (32)(33)(34), while there is also evidence to link CDC42 expression with cervical cancer invasion and migration (35). The region has been previously associated with uterine fibroids, endometriosis, endometrial cancer (36), epithelial ovarian cancer, gestational age, and bone mineral density (Supplementary Table 7).
As with other cervical phenotypes, we observed a significant association on   Table 4), two of them overlapping with enhancer elements.
We compared the signal in the 2q13 locus across the analysed cervical phenotypes (Supplementary Figure 5; Supplementary Table 5) and found that the lead . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 27, 2021. ; https://doi.org/10.1101/2021.06.18.21259075 doi: medRxiv preprint signals for ectropion (rs3748916) and cervicitis/dysplasia (rs1049137) are weakly correlated (r2=0.27, EUR), indicating independent or partly independent signals in the same region. The cervical cancer lead signal was moderately correlated (r2=0.45-0.53, EUR) with cervicitis/dysplasia and ectropion signals, respectively. This is supported by the fact that although the sets of most likely causal variants mostly overlapped for cervicitis, dysplasia, and cancer, the credible set variants seems to be different for ectropion.  Table 3). CLPTM1L is a membrane protein and its overexpression in cisplatin-sensitive cells causes apoptosis.
Polymorphisms in this region have been reported to increase susceptibility to cancer, including lung, pancreatic, and breast cancers (Supplementary Table 7). Variants in the credible set overlap with active TSS, as well as with several enhancer and ZNF repeat marks in the CLPTM1L gene (Supplementary Figure 4).
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.   Members of this family regulate apoptosis in epithelial cells, are linked to cancer, and GSDMB has been linked with invasion and metastasis in breast cancer cells (37) and in cervical cancer (38). Specifically, the splice variant rs11078928 deletes exon 6 encoding 13 amino acids in the critical N-terminus and abolishes the pyroptotic activity (pyroptosis is a type of cell death) of the GSDMB protein (39). This region has been previously associated with asthma, inflammatory bowel disease, ulcerative colitis, Crohn's disease, multiple sclerosis, primary biliary cholangitis, rheumatoid arthritis and other disorders with an immune etiology, but also with cervical cancer (40).
Given the similarity in signals identified for cervical dysplasia and cervical cancer (Table 1 and 2), we jointly analysed the GWAS results for dysplasia and cancer and identified an additional signal on chromosome 19 (rs425787, p=3.5 ×10 -8 , Supplementary   Figure 6) that remained sub-significant in the cervical cancer analysis alone (p=2.1×10 -7 ). Since this locus was not significant in the cervical cancer meta-analysis, it was not included in the colocalisation and fine-mapping analyses. The association signal overlaps with enhancer histone marks in HeLa cervical carcinoma cell line (Supplementary Figure   7) and is in the 3' region of CD70. CD70 is a cytokine with an important role in T-cell immunity during antiviral response, and its high expression has been associated with a favorable outcome in cervical cancer patients (41).

Dysplasia signals stratified by dysplasia severity and in cancer
We stratified the dysplasia phenotype to evaluate the meta-analysis effect sizes (odds ratios) in relation to pathology severity. Figure 2 shows the effect estimates in dysplasia subphenotypes and in cervical cancer meta-analysis. In general, odds ratios correlated with degree of pathology, although there was an overlap in confidence intervals ( Figure 2). An interesting exception seems to be rs12611652 near DAPL1, which is associated with different cervical dysplasia subphenotypes, but not with cervical cancer.
DAPL1 is expressed in epithelium and may play a role in the early stages of epithelial differentiation or in apoptosis and is a suppressor of cell proliferation in retinal pigment . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All cervical cancer lead signals were at least nominally significant (p<0.05) in cervical dysplasia analysis, rs4849177 and rs35508382 were also genome-wide significant (Supplementary Table 2), confirming the similarity between genetic risk factors of cervical dysplasia and cancer. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

HLA fine-mapping
Since both cervical dysplasia and cervical cancer show an association signal in the HLA region, we used the larger cervical dysplasia dataset in EstBB to further map the cervical dysplasia association signal in the HLA region. We found the HLA-DRB1 amino acid polymorphism at position 37 (p=1.4 ×10 -18 ) drove the association in the HLA region (frequency in cases 15%, in controls 17%) (Figure 3). After conditioning on this, we . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.  . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
Several other traits reflective of smoking behavior (incl. lung cancer) were also nominally significant (Supplementary Table 9).

Discussion
Despite cervical cancer being a considerable global problem, it is relatively rare and thus previous GWAS have been underpowered to fully characterise the genetic architecture of this disease. Although several GWAS for cervical cancer have been conducted in non-European populations, there has not been a highly-powered trans-ethnic meta-analysis.
By bringing together all publicly available datasets, our analyses allow us to take a more comprehensive look into the genetic risk factors of cervical cancer, and furthermore, enable us to assess the genetic determinants of the cervical pathology spectrum.
Previous studies have also not taken the step further to map the most likely causal genes and variants at each locus, which is important to understand the underlying biology.
In this study, we present the results from the largest trans-ethnic GWAS metaanalysis of cervical cancer and other cervical phenotypes, including up to 9,229 cervical cancer cases and 490,304 controls. Compared to the latest cervical cancer GWAS metaanalysis (8), our study is larger and has a trans-ethnic aspect, includes a wider selection of phenotypes, and uses multiple post-GWAS analyses to finemap the signals. We report four strong non-HLA signals (LINC00339/CDC42/CDC42-AS1, PAX8/PAX8-AS1, CLPTM1L, GSDMB), and leverage the latest computational methods and available genetical genomics datasets to pinpoint the most likely causal genes and variants for each associated locus. We further take advantage of the genetic similarity between cervical cancer and dysplasia to refine the association signal in the HLA locus to three independent association signals on amino acid polymorphism and two-digit allele level (HLA-DRB1 positions 30 and 37, and HLA-B*08), and propose a potential association on chromosome 19 near CD70, which can be followed up in further validation studies. By analysing the genetics of cervical ectropion and cervicitis in addition to dysplasia and . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 27, 2021. ; https://doi.org/10.1101/2021.06.18.21259075 doi: medRxiv preprint cancer, we conclude that PAX8/PAX8-AS1 appears to have a dual role in cervical biology: PAX8 signaling is not only important for female genital system development, but PAX8 is also upregulated in reproductive cancers, enhancing the proliferation of tumor cells (43).
Previously, it has been reported that several novel PAX8 transcripts can be observed in cervical carcinoma, indicating differential regulation properties during carcinogenesis (44).
Cervical dysplasia is the first step towards cervical malignancy. Overall, the identified genetic associations were very similar in both dysplasia and cancer, and mirrored closely the results from a recent joint analysis of severe dysplasia and cervical cancer (8). This indicates that further studies could include both phenotypes to increase power to detect novel associations. One exception was rs12611652 near DAPL1/PKP4, which was associated with dysplasia but not with cancer. Given that DAPL1 has a role in epithelial differentiation, apoptosis, and is potentially a suppressor of cell proliferation, and PKP4, highlighted in colocalization analysis, is associated with invasion and metastasis of cancer (45), both genes are interesting candidates for further analysis because of their potential protective effect in cervical malignancy development.
Our study provides additional support for potential causal variants and genes at each locus. Although previous studies have reported relevant association signals, they have not mapped the most likely causal genes and variants at each locus, which is an important step in understanding the underlying biology. Evaluating the colocalisation of GWAS signals from different traits (including gene expression) gives valuable information on potential shared causal variants, providing the necessary link between genetics, gene expression and disease risk. We were able to detect colocalisation with gene expression or transcription events for all the evaluated non-HLA loci, which provides evidence that variants in our GWAS signal are involved in regulating the expression or transcription of these genes. At the same time, since reproductive tissues are underrepresented in widely used gene expression datasets, we had to rely on tissues that are similar to female reproductive tract tissues, based on cellular composition and gene expression patterns.
Therefore, more extensive characterisation of gene expression regulation in reproductive tract tissues is urgently needed to facilitate correct interpretation of GWAS signals. We also constructed the 95% credible sets of causal variants and compared them to . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. although the region is a known risk locus for other gynecological problems, such as endometriosis, uterine fibroids, and ovarian cancer. Our results support LINC00339 and CDC42/CDC42-AS1 as the most likely candidate genes in this locus, which is in line with evidence from other cancers (32)(33)(34)(35). In fact, previous studies have shown that knocking down LINC00339 expression leads to increased CDC42 expression (48), which is supported by data from eQTLsvariants associated with increased expression of LINC00339 have an opposite effect on CDC42 expression (49). Thus, it cannot be ruled out that several jointly regulated genes in this locus contribute to cancer pathogenesis.
The conducted genetic correlation analyses indicating significant genetic correlation between cervical cancer and age at first birth and smoking closely mirror the results from a recent Mendelian randomization analysis, which showed smoking increases, and older age at first pregnancy decreases the risk of cervical cancer, respectively (8). Several potentially interesting correlations (such as with parental age at death and lung cancer) were nominally significant in our analysis and future studies with larger sample sizes and more power are needed to fully elucidate the shared genetics between these traits and cervical malignancy.
Our analyses are based on population-based biobank data, which offers access to large sample sizes, but at the same time it can hinder the accessibility to more detailed clinical information (such as HPV status), especially when using summary-level data.
Further studies evaluating the detected loci in relation to specific HPV strains or histopathological features will elucidate their more specific role in cervical pathology etiopathogenesis. We used relatively simple phenotype definitions based solely on ICD-. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 27, 2021. ; codes, which on one hand simplifies data analysis, but on the other hand may introduce unwanted heterogeneity as the use of these codes might somewhat vary in different healthcare systems. However, we replicate many previously reported associations with cervical cancer, suggesting our approach is suitable. Although our study is the first attempt at a transethnic GWAS meta-analysis, the number of non-European samples is small, and given the high prevalence of cervical malignancy in non-European populations, additional Black and Asian populations should be included in analyses.
Our study provides the first insight into the genetics of cervical ectropion and cervicitis, which is an important step towards a complete understanding of cervical biology. We further clarify the genetic background of cervical malignancy, supporting the involvement of genes important for reproductive tract development, immune response, and cellular proliferation/apoptosis. The detailed characterisation of association signals, together with mapping of causal variants and genes offers valuable leads for further functional studies which may eventually lead to better treatment and prevention of cervical neoplasia.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 27, 2021. ; https://doi.org/10.1101/2021.06.18.21259075 doi: medRxiv preprint