A genetic perspective on the relationship between non-cancerous gynecological diseases and endometrial cancer

The non-cancerous gynecological diseases, endometriosis, polycystic ovary syndrome (PCOS) and uterine fibroids, have been proposed as endometrial cancer risk factors; however, disentangling their relationships is complicated due to their shared risk factors and comorbidity. Using genome-wide association study (GWAS) summary data, we have explored the relationship between these non-cancerous gynecological diseases and endometrial cancer risk, assessing genetic correlations, causal relationships and shared genetic risk regions. Firstly, we found significant genetic correlation between endometrial cancer and PCOS (rG = 0.36, se = 0.12, P = 1.6x10-3), and uterine fibroids (rG = 0.24, se = 0.09, P = 5.4x10-3). Adjustment for genetically predicted body mass index (BMI; a risk factor for PCOS, uterine fibroids and endometrial cancer) substantially attenuated the genetic correlation between endometrial cancer and PCOS, but not uterine fibroids. Despite the observed genetic correlation, genetic causal inference tests (latent causal variable and Mendelian randomization analyses) did not support a causal relationship between any of the non-cancerous gynecological diseases and endometrial cancer. Gene-based association analysis revealed four shared endometriosis and endometrial cancer risk loci (9p21.3, 15q15.1, 17q21.32 and 3q21.3) and two shared uterine fibroid and endometrial cancer risk loci (5p15.33 and 11p13). In summary, we have shown that PCOS and uterine fibroids are genetically correlated with endometrial cancer, although the genetic architecture shared between endometrial cancer and PCOS likely relates to BMI. Furthermore, shared genetic risk regions, and thus potentially shared causal genes, were identified between the risk for endometrial cancer and endometriosis, and uterine fibroids.

. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted November 12, 2020. ;https://doi.org/10.1101https://doi.org/10. /2020 However, the prevalence of these three diseases is likely to be underestimated because of underdiagnosis (Agarwal et al. 2019;De La Cruz and Buchanan 2017). Although the three non-cancerous gynecological diseases primarily affect premenopausal women and endometrial cancer is largely a postmenopausal malignancy, many risk factors are shared (e.g. chronic estrogen exposure, inflammation, insulin resistance and obesity (Harris and Terry 2016;Li et al. 2019;Wise et al. 2016)), supporting a relationship between them.
A number of studies have used observational data to assess associations between the three non-cancerous gynecological diseases and endometrial cancer risk, the findings of which have been heterogeneous (Harris and Terry 2016;Johnatty et al. 2020;Li et al. 2019;Wise et al. 2016). Indeed, the use of observational studies to evaluate these associations can be confounded by: (i) the failure to adequately account for potential confounders e.g. oral contraceptive use (a common treatment for endometriosis and PCOS, which is associated with reduced endometrial cancer risk); (ii) the reliance of disease status classification on selfreported data which is subject to misclassification bias from asymptomatic undiagnosed cases; (iii) misdiagnosis of early stage endometrial cancer as uterine fibroids due to shared clinical presentation; (iv) detection bias in cohort studies as a result of an increased surveillance for endometrial cancer among patients with non-cancerous gynecological diseases; and (v) the comorbidity of non-cancerous gynecological diseases e.g. patients with uterine fibroids can concurrently suffer from endometriosis (Choi et al. 2017;Johnatty et al. 2020;Matalliotaki et al. 2018;Nagai et al. 2015;Uimari et al. 2011) or PCOS (Wise et al. 2007). Thus, it remains difficult to determine from observational studies if there is a true causal association between these non-cancerous gynecological diseases and endometrial cancer.
Genome-wide association study (GWAS) data have demonstrated genetic overlap between endometrial cancer and endometriosis (Masuda et al. 2020;Painter et al. 2018), and uterine fibroids (Masuda et al. 2020), which may partly explain the comorbidities of these diseases.
However, whether these comorbidities are due to causal relationships or shared genetic etiology remains to be explained. In this study, we have further used GWAS data to disentangle associations between non-cancerous gynecological disease and endometrial cancer. As inherited genetic variants are non-modifiable, genetic approaches to explore . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted November 12, 2020. ;https://doi.org/10.1101https://doi.org/10. /2020 relationships between phenotypes are less influenced by confounding inherent in observational studies. Firstly, we have performed genetic correlation analysis, using the largest currently available datasets to clarify the shared genetic risk between the noncancerous gynecological diseases and endometrial cancer. To investigate causal relationships, we have assessed genetic causal inference through analyses that use genetic variants associated with the gynecological disease of interest as instruments. Lastly, it is possible that these diseases may not be causally related to endometrial cancer but rather develop through effects of disease-specific risk variants at shared risk loci. To assess this possibility, we have performed gene-based analysis of each GWAS to identify shared genetic risk loci.

GWAS data
GWAS summary data were publicly available for PCOS (Day et al. 2018) (https://doi.org/10.17863/CAM.27720) and uterine fibroids (Gallagher et al. 2019) (ftp://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GallagherCS_31649266_GCST00 9158). For endometriosis, we accessed data (Sapkota et al. 2017) via collaboration. For PCOS and uterine fibroids, publicly available summary data were based on analyses excluding the 23andMe, Inc., cohort in the original publication, due to data sharing agreements with the company (Day et al. 2018;Gallagher et al. 2019). For Mendelian randomization analyses, risk estimates and respective standard errors of genome-wide significant variants were accessed from the largest published GWAS for each disease. Details of studies and sample sizes used in each analysis are shown in Table 1. Detailed descriptions of the quality control procedures and GWAS analysis can be found in the corresponding publications.
GWAS summary data for endometrial cancer were derived from O' Mara et al. (2018). As the GWAS for endometrial cancer (O'Mara et al. 2018), endometriosis (Rahmioglu et al. 2018), and uterine fibroids (Gallagher et al. 2019) included participants from UK Biobank, we reanalyzed the endometrial cancer dataset, excluding these participants to avoid sample overlap bias in the two sample Mendelian randomization analysis. This revised endometrial cancer GWAS meta-analysis consisted of 12,270 cases and 46,126 controls of European descent.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted November 12, 2020. ; https://doi.org/10. 1101 Genetic variants with minor allele frequency (MAF) < 1% and imputation information score < 0.4 were filtered, leaving ~9 million genetic variants. This revised endometrial cancer GWAS was used only in Mendelian randomization analysis, while the published endometrial cancer GWAS (O'Mara et al. 2018) was used in all other analyses. Prior to genetic correlation and latent causal variable (LCV) analyses, genetic variants in the extended human major histocompatibility complex region (26-34 Mb on chromosome 6) were removed due to the complex linkage disequilibrium (LD) structure in this region.

Genetic correlation between non-cancerous gynecological diseases and endometrial cancer
We used LD Score regression (Bulik-Sullivan et al. 2015) to estimate the genetic correlation between each non-cancerous gynecological disease and endometrial cancer. Genetic correlation analyses were restricted to common HapMap3 variants (MAF > 0.01). To reduce bias from potential residual confounding in genetic correlation analyses, including unknown sample overlap, we used the estimated genetic covariance intercept, obtained without constraint. Genetic correlation values range from -1 to 1; positive values indicated that shared genetic variants have concordant effects across the genome, whereas negative values indicated divergent effects.
Evidence of genetic correlation may reflect a causal relationship or sharing of genetic pathways. Obesity is a major risk factor for endometrial cancer, and is prevalent amongst women with PCOS and uterine fibroids (Ilaria and Marci 2018;Sam 2007). For genetic correlation of PCOS or uterine fibroids with endometrial cancer, we thus additionally corrected for the effect of obesity, as measured by genetically predicted BMI. PCOS and uterine fibroids GWAS were conditioned using summary data from a large GWAS of BMI (Yengo et al. 2018) in GCTA-mtCOJO analysis (Zhu et al. 2018) before performing LD score regression analysis.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted November 12, 2020. ; https://doi.org/10.1101/2020.11.09.20228114 doi: medRxiv preprint

Genetic causal inference tests
We conducted LCV analysis using GWAS summary data to estimate the proportion of genetic components for non-cancerous gynecological diseases that also affect endometrial cancer risk, while correcting for heritability and genetic correlation between traits (O'Connor and Price 2018). This method is robust to confounding due to pleiotropy. The genetic causality proportion (GCP) derived from LCV analysis ranges from -1 to 1. GCP values close to 1 indicate that genetic components for non-cancerous gynecological disease may affect endometrial cancer risk, while GCP values close to -1 indicate that genetic components for endometrial cancer may affect non-cancerous gynecological disease risk.
To further explore potential causal relationships, we performed two-sample Mendelian randomization analysis using independent (LD r 2 < 0.01) genetic variants associated with the non-cancerous gynecological diseases at genome wide significance (P < 5 × 10 -8 ) as instruments. The list of genetic instruments and the respective risk association estimates were extracted from the largest GWAS of endometriosis (Rahmioglu et al. 2018), PCOS (Day et al. 2018) and uterine fibroids (Gallagher et al. 2019). We filtered out independent genetic variants with ambiguous alleles and intermediate frequencies (i.e., variants with A/T or C/G alleles and minor allele frequency of more than 0.42), leaving 26 variants as genetic instruments for endometriosis, 14 for PCOS and 25 for uterine fibroids.
As non-cancerous gynecological diseases mostly affect premenopausal women and endometrial cancer primarily affects postmenopausal women, we performed a unidirectional analysis, assessing the effect of genetic predisposition to non-cancerous gynecological disease on endometrial cancer risk. We used inverse variance weighted (IVW) analysis as the primary analysis by regressing the genetic variant-endometrial cancer association on the genetic variant-non-cancerous gynecological disease association, weighted by inverse of their variance. This method has the most power to detect associations; however it has a strong assumption of no heterogeneity (potentially resulting from pleiotropy) amongst genetic variants (Hemani et al. 2018). Thus, this method assumes all genetic variants for the exposure of interest have a proportional effect on outcome risk.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted November 12, 2020. ; https://doi.org/10. 1101 We also performed several sensitivity analyses which are more robust to heterogeneity amongst genetic variants: MR-Egger, weighted median, and weighted mode analysis. MR-Egger analysis regresses genetic variant-outcome association on genetic variant-exposure association, without a constraint on the regression intercept (Bowden et al. 2015). If the MR-Egger regression intercept is non-zero, it provides evidence of horizontal pleiotropy amongst genetic variants (where a genetic variant belongs to more than one independent pathway that influences outcome risk). The MR-Egger regression slope represents valid effect estimates after controlling for pleiotropic effects, provided the Instrument Strength Independent of Direct Effect (InSIDE) assumption (where a genetic variant's association with the exposure of interest is independent from its direct effect on outcome) is met (Bowden et al. 2015). We also performed weighted median (Bowden et al. 2016) and weighted mode (Hartwig et al. 2017) analyses, which are more robust to violation of the InSIDE assumption. Weighted median analysis relies on the assumption that more than 50% of the weights come from valid genetic instruments (Bowden et al. 2016), while weighted mode analysis relies on the assumption that most of the weights come from valid genetic instruments (Hartwig et al. 2017).
LCV and two-sample Mendelian randomization analysis were performed using the CTG-VL platform (Cuéllar-Partida et al. 2019) and the "TwoSampleMR" (Hemani et al. 2018) package in R, respectively. Unless stated otherwise, results with a Bonferroni-corrected pvalue for testing the three non-cancerous gynecological diseases (P < 0.05/3 = 0.017) were considered statistically significant.

Gene-based analysis
While genetic correlation analysis assesses the average genetic concordance across the genome for two traits, it does not reveal common local genomic regions that harbor traitassociated variation. Further, a lack of evidence for genetic correlation may reflect opposing pleiotropic effects across the genome. To identify genetic risk regions shared between the non-cancerous gynecological diseases and endometrial cancer, we performed gene-based . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted November 12, 2020. ; https://doi.org/10.1101/2020.11.09.20228114 doi: medRxiv preprint analysis using the fast and flexible set-based association test (fastBAT) (Bakshi et al. 2016).
fastBAT performs enrichment analysis on sets of GWAS variants located within 50kb of gene regions. A random sample of 10,000 unrelated participants from the UK Biobank was used as the reference panel in these analyses. We applied a false discovery rate (FDR) < 0.05 for the gene-based analysis and adjacent identified genes were combined into a single locus if within 1 Mb of each other.

Results
We found endometrial cancer was significantly genetically correlated with PCOS (r G = 0.36, se = 0.12, P = 1.6×10 -3 ) and uterine fibroids (r G = 0.24, se = 0.09, P = 5.4×10 -3 ) but not with endometriosis (r G = -0.02, se = 0.09, P = 0.83) ( Table 2). After adjusting for genetically predicted BMI, the genetic correlation between PCOS and endometrial cancer decreased and was no longer significant (r G = 0.19, se = 0.14, P = 0.17), indicating that the initial genetic correlation was, at least partly, mediated by genetically predicted BMI ( Table 2). In contrast, there was no material difference in the genetic correlation between uterine fibroids and endometrial cancer after adjusting for genetically predicted BMI ( Table 2).
The LCV analysis provided no evidence to suggest genetic components of the non-cancerous gynecological diseases affect endometrial cancer risk ( Table 3). Mendelian randomization analyses also provided no evidence of causal relationships between genetic predisposition to the non-cancerous gynecological diseases and endometrial cancer risk (Table 3, Figure 1).
No MR-Egger intercepts differed from zero (Table 3), indicating a lack of evidence for confounding by horizontal pleiotropy amongst genetic instruments.
Gene-based analyses by fastBAT revealed 24 genetic regions associated with endometrial cancer risk, 28 regions with endometriosis risk and 41 regions with uterine fibroids, while no associations with PCOS passed FDR < 0.05 (Supplementary Table 1). We found four shared genetic risk regions (3q21.3, 9p21.3, 15q15.1 and 17q21.32) for endometriosis and endometrial cancer, containing seven shared candidate susceptibility genes ( Table 4) Table 4). Additionally, we . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted November 12, 2020. ; 1 0 found two shared genetic risk regions (5p15.33 and 11p13) for uterine fibroids and endometrial cancer, containing five shared candidate susceptibility genes ( Table 5). Through GWAS, one of these regions (5p15.33) has been associated with uterine fibroids risk (Gallagher et al. 2019) and the other (11p13) has independently associated with uterine fibroids and endometrial cancer risk (Gallagher et al. 2019;O'Mara et al. 2018). The LD of lead risk variants at each gene was compared but there was no strong genetic correlation at either of these regions (r 2 ≤ 0.4; Table 5), suggesting that these genetic risk signals may be independent.

Discussion
Using large-scale genome-wide datasets, we observed evidence of positive genetic correlation between endometrial cancer and PCOS, and uterine fibroids, but not endometriosis. The observed genetic correlation between endometrial cancer and PCOS was at least partly mediated by genetically predicted BMI, consistent with the role of BMI as a common risk factor for PCOS and endometrial cancer. Further genetic analyses provided no evidence for a causal relationship between the non-cancerous gynecological diseases and endometrial cancer but revealed several genetic risk regions shared between endometrial cancer and endometriosis, and uterine fibroids.
Previous studies have demonstrated biological links between endometriosis and endometrial cancer (reviewed in Bulun (2009)), but we found no evidence for genetic correlation.
However, two previous studies have reported a positive genetic correlation (Masuda et al. 2020;Painter et al. 2018). These discrepancies with our study may be related to: i) the smaller sample sets used by these studies; ii) the ethnicity studied (Masuda et al. (2020) analyzed a Japanese population); or iii) the different genetic correlation analysis approaches used. For example, unlike Painter et al. (2018), we used an unconstrained LD score regression intercept to account for potential residual confounding, resulting in a conservative estimate of genetic correlation. Indeed, we found the estimated genetic covariance intercept to be significantly different from zero, suggesting the presence of bias from population stratification and/or sample overlap. The null results from the genetic causal inference analyses of endometriosis and endometrial cancer are concordant with observational studies . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review) preprint
The copyright holder for this this version posted November 12, 2020. ; 1 1 that observed no associations after controlling for ascertainment bias by excluding recent endometriosis diagnosis (Melin et al. 2007;Olson et al. 2002;Rowlands et al. 2011).
Despite the limited evidence for genetic correlation and causal relationships between endometriosis and endometrial cancer, we identified four shared genetic risk regions, three of which (9p21.3, 15q15.1 and 17q21.32) (Kar et al. 2020). However, larger GWAS for endometriosis and endometrial cancer, with more statistical power, will be required to detect genome-wide significant associations at this region.
The null associations identified between PCOS and endometrial cancer risk from genetic causal inference analyses is concordant with observational studies that have found no association after accounting for the effect of obesity (Fearnley et al. 2010;Zucchetto et al. 2009). Indeed, these findings are consistent with our observation of substantial attenuation in genetic correlation between PCOS and endometrial cancer after adjusting for genetic components of BMI.
Although we detected a genetic correlation between uterine fibroids and endometrial cancer risk, consistent with observational studies (Fortuny et al. 2009;Rowlands et al. 2011;Wise et al. 2016), no evidence for a causal association was found. fastBAT analysis revealed two genetic risk regions (5p15.33 and 11p13) that may be shared by both diseases. The 5p15.33 region has been associated with uterine fibroids in GWAS (Gallagher et al. 2019) and with endometrial cancer in a candidate locus study (Carvajal-Carmona et al. 2015). Three . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The most biologically relevant of these is TERT, which encodes telomerase reverse transcriptase and maintains chromosomal stability by elongating the telomere (Rubtsova et al. 2012). Relevantly, chromosomes in uterine fibroids (Bonatz et al. 1998;Rogalla et al. 1995) and in endometrial tumors (reviewed by Alnafakh et al. (2019) ). WT1 has also been identified through chromatin looping as a candidate target of uterine fibroids risk variants (Rafnar et al. 2018). WT1 encodes a transcription factor that is essential for urogenital development (reviewed by Roberts (2005)) and in the GTEx database is most highly expressed in the uterus (https://gtexportal.org/home/). These observations suggest that alteration of uterine WT1 expression by risk variation associated with endometrial cancer and uterine fibroids may affect susceptibility to these diseases.
Despite the genetic causal inference test analyses providing no evidence for causal relationships, we cannot discard the possibility that non-cancerous gynecological diseases have modest causal effects on endometrial cancer. Nevertheless, our findings indicate that the shared comorbidities are more likely to be related through shared genetic risk elements and thus women who have had non-cancerous gynecological disease may be at a greater risk of developing endometrial cancer. Future research will be needed to clarify such relationships.
Comorbidities may complicate our interpretation of the relationships between the noncancerous gynecological diseases and endometrial cancer revealed in observational studies.
To reduce this confounding, prospective studies with long follow-up, large sample sizes and case identification using surgical confirmation would ideally be required. A strength of our . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted November 12, 2020. ; https://doi.org/10.1101/2020.11.09.20228114 doi: medRxiv preprint study is that genetic causal inference analysis is a cost-effective alternative approach that allows assessment of potential causal relationships while reducing bias from unmeasured confounding. Another strength is that we have demonstrated the use of gene-based analysis to detect genetic risk regions below a genome-wide significance threshold. A limitation of the gene-based analysis is that many nearby genes are likely to share GWAS variants, thus the role of the identified candidate susceptibility genes requires functional follow-up. Further, we have limited ability to identify candidate genetic risk regions and causal relationships for PCOS given the smaller dataset available. These analyses should be revisited when more genome-wide significant variants are revealed in future PCOS GWAS.
In conclusion, our study has revealed genetic risk variation shared between non-cancerous gynecological diseases and endometrial cancer, both across the genome and at local genomic regions, providing insights into the shared etiology of these diseases.     TopSNP: lead variant for gene from fastBAT analysis; EC: endometrial cancer; UF: uterine fibroids; LD: linkage disequilibrium; FDR: false discovery rate.
*LD was estimated using the EUR 1000Genomes reference panel. .

CC-BY 4.0 International license
It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. analysis. The boxes represent odds ratio for endometrial cancer risk per standard deviation increment in genetically predisposition to non-cancerous gynecological disease. Error bars represent 95% confidence intervals.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted November 12, 2020. ; https://doi.org/10. 1101