The relationships between women's reproductive factors: a Mendelian randomization analysis.

Background: Women's reproductive factors include their age at menarche and menopause, the age at which they start and stop having children, and the number of children they have. Studies that have linked these factors with disease risk have largely investigated individual reproductive factors and have not considered the genetic correlation and total interplay that may occur between them. This study aimed to investigate the nature of the relationships between eight female reproductive factors. Methods: We used data from the UK Biobank and genetic consortia with data available for the following reproductive factors: age at menarche, age at menopause, age at first birth, age at last birth, number of births, being parous, age at first sex and lifetime number of sexual partners. Linkage disequilibrium score regression (LDSC) was performed to investigate the genetic correlation between reproductive factors. We then applied Mendelian randomization (MR) methods to estimate the causal relationships between these factors. Sensitivity analyses were used to investigate directionality of the effects, test for evidence of pleiotropy and account for sample overlap. Results: LDSC indicated that most reproductive factors are genetically correlated (rg range: |0.06 - 0.94|), though there was little evidence for genetic correlations between lifetime number of sexual partners and age at last birth, number of births and ever being parous (rg < 0.01). MR revealed potential causal relationships between many reproductive factors, including later age at menarche (1 SD increase) leading to a later age at first sexual intercourse (Beta (B)=0.09 SD, 95% confidence intervals (CI)=0.06,0.11), age at first birth (B=0.07 SD, CI=0.04,0.10), age at last birth (B=0.06 SD, CI=0.04,0.09) and age at menopause (B=0.06 SD, CI=0.03,0.10). Later age at first birth was found to lead to a later age at menopause (B=0.21 SD, CI=0.13,0.29), age at last birth (B=0.72 SD, CI=0.67,0.77) and a lower number of births (B=-0.38 SD, CI=-0.44,-0.32). Conclusion: This study presents evidence that women's reproductive factors are genetically correlated and causally related. Future studies examining the health sequelae of reproductive factors should consider a woman's entire reproductive history, including the causal interplay between reproductive factors.

Introduction weighted (IVW) method was used in the primary analysis to assess the causal relationships between 134 pairs of reproductive factors. This method combines Wald ratios, calculated by dividing the SNP-135 outcome association by the SNP-exposure association, in a multiplicative random effect meta-136 analysis where the weight of each ratio is the inverse of the variance of the SNP-outcome 137 association. (41) We assessed earlier-occurring reproductive factors as the exposure in relation to 138 later-occurring factors (the outcomes), e.g. AAM was investigated as a potential cause of age at 139 menopause but not vice versa. In some cases where there was no clear temporal ordering, we 140 carried out analyses in both possible directions, e.g., between ever parous status and age at 141 menopause. These cases are shown in Table S1. All relationships tested by MR are shown in Table  142 S2, and GWAS estimates were standardized (mean = 0 and standard deviation (SD) = 1) prior to 143 performing MR. 144 The IVW method makes a number of assumptions: that the genetic instruments are strongly 145 associated with the exposure; do not share common causes, either genetic or other confounders 146 such as population stratification, with the outcome; and are not pleiotropic i.e., do not have an 147 effect on the outcome through a pathway other than via the exposure. (41) We therefore performed 148 a series of sensitivity analyses to evaluate the robustness of our results to these assumptions (see 149 "Evaluating assumptions"). 150 . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 29, 2021. ; https://doi.org/10.1101/2021.09.29.21264251 doi: medRxiv preprint which is advantageous over other methods due to the large sample size. In this analysis, the GWAS 152 used for the exposure and outcome were both performed on women in the UK Biobank study, and 153 therefore the exposure and outcome samples overlap entirely. Large overlap in the sample(s) used 154 to generate genetic variant-exposure and genetic variant-outcome associations can introduce bias in 155 estimates obtained using two-sample MR. (42) In particular, sample overlap between the exposure 156 and outcome samples may bias estimates towards the observational (and potentially confounded) 157 exposure-outcome association and may lead to an overestimation of effects. (42) While it has been 158 proposed that this approach of applying two-sample MR methods in a single sample may be 159 performed within large studies with minimal bias introduced to the causal estimates by sample 160 overlap, (43) we performed a series of sensitivity analyses to evaluate the robustness of our results to 161 this (see "Assessing the impact of sample overlap"). 162 Evaluating MR assumptions 163 We evaluated the likelihood that MR assumptions were violated where we found evidence of effects 164 in our primary analysis. 165

Instrument strength 166
The strength of the genetic instrument for each reproductive factor in the main IVW analysis was 167 assessed using the mean F statistic, calculated based on the variance explained (r 2 ) by the genetic 168 instrument and sample size of the exposure. (30) 169

Negative controls 170
We repeated our primary analysis for five 'negative control' pairs of reproductive factors, for which 171 we would not expect to see causal effects due to their temporal ordering (the outcome occurring 172 before the exposure). These negative controls included the effect of AFB on AAM, AFS on AAM, AFB 173 on AFS, age at menopause on AFS, and ALB on AAM. In these cases, any evidence of an effect would 174 . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint Only those reproductive factor associations for which there was evidence of an effect from the 212 primary analysis were taken forward for this sensitivity analysis, as the causal effect would likely be 213 overestimated when performing MR with overlapping exposure and outcome samples. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint    is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 29, 2021.  Table S6, and a causal graph shows where we found evidence of an effect between 259 reproductive factors (Figure 3). 260 . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 29, 2021. ; https://doi.org/10.1101/2021.09.29.21264251 doi: medRxiv preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 29, 2021. ; https://doi.org/10.1101/2021.09.29.21264251 doi: medRxiv preprint

Negative controls 264
We found little evidence for an effect of age at menopause on AFS (B=0.03 SD, CI=-1.32x10 -3 , 0.05), 265 of ALB on AAM (B=0.11 SD, CI=-0.12, 0.34), of AFB on AAM (B=0.04 SD, CI=-0.07, 0.16), or of AFS on 266 AAM (B=0.10 SD, CI=-7.03x10 -4 ,0.21) (Table S7). However, there was strong evidence for an effect of 267 AFB on AFS (B=0.58 SD, CI=0.52, 0.65), suggestive of shared pleiotropy. 268 Heterogeneity 269 For the relationships identified in the primary analysis, evidence for heterogeneity in the individual 270 SNP effects in the IVW was present across many of the investigated relationships, except for 271 between AFB and ALB, and between ALB and number of births (Table S8). Evidence for 272 heterogeneity could indicate the presence of SNP outliers which were investigated using MR PRESSO 273 (see "Pleiotropy"). 274 is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 29, 2021. ; https://doi.org/10.1101/2021.09.29.21264251 doi: medRxiv preprint and age at menopause, lifetime number of sexual partners, number of births and ever being parous. 281 Furthermore, the effect of AFB on age at menopause, ALB on number of births and lifetime number 282 of sexual partners and ever being parous appeared inconsistent across the different MR methods. In 283 the primary analysis, the only instance where the MR-Egger intercept test revealed evidence for 284 directional pleiotropy was in the relationship between AAM and lifetime number of sexual partners 285 (Table S10). We also applied MR-PRESSO to the UK Biobank full overlap GWAS to additionally test 286 for evidence of pleiotropy and correct for outliers (Table S11). MR-PRESSO revealed evidence for 287 outliers in almost all tests, other than for the relationships between AFB and ALB. However, after 288 outlier-correction, there was little change in the strength of evidence. 289 We applied an MR Steiger method to assess whether we had captured the intended causal direction 290 between reproductive factors where the causal direction was unclear. Findings show aggregated 291 instruments have successfully captured the intended causal direction in all cases (Table S12). Steiger 292 filtering was also implemented to assess whether there were any individual SNPs that did not 293 capture the intended causal direction, and results are displayed in Table S13. Where instruments 294 contained SNPs that did not capture the intended causal direction, MR analysis was then performed 295 excluding those SNPs and the strength of evidence for the causal estimate using the IVW method did 296 not change (Table S14). 297 . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 29, 2021. ; https://doi.org/10.1101/2021.09.29.21264251 doi: medRxiv preprint wide significance (p<5x10 -8 ) after LD clumping (r 2 < 0.001 and a distance of 10 000 kb) ( Table S15). 301 No SNPs were identified at genome-wide significance in relation to ALB and parous status in the 302 GWAS performed on one of the UK Biobank split-samples, therefore the split-sample MR was only 303 conducted once when ALB or ever parous status was the exposure. 304 Where SNPs were identified in the split-sample analysis, F statistics were above the standard 305 threshold of 10, indicative of strong genetic instruments (Table S16). However, there was little 306 overlap in the SNPs which surpassed genome wide significance between sample 1 and sample 2, 307 with 9 SNPs overlapping between samples for AAM and age at menopause but none for the other 308 traits (Table S16). A number of the SNPs identified in one of the samples of the split sample GWAS 309 were identified above the significance threshold but remove during LD clumping in the GWAS of the 310 other sample, while other SNPs were just below the significance threshold or appeared not to be 311 associated (Table S17). 312 We performed MR for each relationship twice, i.e., MR of exposure in sample 1 on outcome in 313 sample 2, and MR of exposure in sample 2 on outcome in sample 1. This was with the exception of 314 the MR analyses when ALB and parous status were the exposure, which were assessed only once 315 (Table S18). We then meta-analysed findings between both samples, which showed limited evidence 316 of heterogeneity between the causal estimates obtained from the split-sample MRs. Full results of 317 the meta-analysis can be found in Table S19. 318

Replication consortia 319
Inter-relations between the reproductive factors were also investigated using GWAS summary 320 statistics from consortia studies which excluded UK Biobank. 60 SNPs were identified at genome 321 . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 29, 2021. ; https://doi.org/10.1101/2021.09.29.21264251 doi: medRxiv preprint wide significance (p<5x10 -8 ) for AAM (ReproGen), and 5 for AFB (SSGAC) ( Table S20). All F statistics 322 were above the standard threshold of 10, indicative of strong genetic instruments (Table S20). Full 323 results of this analysis can be found in Table S21. Estimates were consistent when using a larger Biobank sample. This method identified slightly more variants at genome-wide significance (p<5x10 -329 8 ) after LD pruning (10 000kb, r 2 =0.001) compared to the main analysis. Between 11 (ALB and 330 number of births) and 231 (AAM) were identified (Table S22). MR estimates were largely similar to 331 the primary analysis, although in some cases effect size were slightly larger, including for the 332 relationship between age at last birth and number of births. Full results of this analysis can be found 333 in Table S23. 334 Assessing evidence of causal effects across sensitivity analyses 335 Figure 4 illustrates the effects which appear robust across multiple sensitivity analyses. In particular, 336 a later AAM appears to have consistent effects on a later AFB, ALB and AFS. In addition, a later AFB 337 leading to a later ALB, later AFS leading to later AFB, and a later AFS leading to a lower number of 338 lifetime sexual partners, were consistent across all sensitivity analyses. There was no consistent 339 evidence for a causal relationship between AAM and lifetime number of sexual partners across 340 sensitivity analysis and limited evidence between AFS and age at menopause. (Figure 4 is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 29, 2021. ; https://doi.org/10.1101/2021.09.29.21264251 doi: medRxiv preprint

Figure 4 Forest plot showing findings from the primary MR analysis and across the sensitivity MR analyses (IVW MR method). Panels to the right of the plots refer to the relationships investigated, and each analysis is shown on the y axis.
. CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint Furthermore our findings support one study that found little evidence for an association between 357 AAM and parity. (26) Additionally, we corroborated the findings of previous MR studies that identified 358 a positive causal relationship between AAM and AFB, ALB and age at menopause, and between AFS 359 and ALB. (59-61) 360 Many estimates identified in the primary analysis appear consistent across sensitivity analyses that 361 aim to account for biases. However, some results did not persist in sensitivity analyses checking for 362 robustness to sample overlap and winner's curse. The split-sample meta-analysed MR shows a 363 weaker magnitude of effect compared to our primary analysis, which may be due to sample size 364 reduction in this sensitivity analysis or bias introduced by sample overlap in the primary analysis. 365 Overall, using replication GWAS studies as the exposure or outcome showed weaker strength of 366 evidence and/or magnitude of effects, although evidence for a causal effect for many relationships 367 assessed was maintained. This may be due to bias introduced by winner's curse in the primary 368 analysis or smaller sample sizes available for the replication studies. In particular, age at menopause 369 from the ReproGen consortium has a sample size of 69,360, compared to 143,791 in our primary 370 . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 29, 2021. ; https://doi.org/10.1101/2021.09.29.21264251 doi: medRxiv preprint analysis, and where this is used as the outcome, we found little evidence of an effect of reproductive 371 factors on age at menopause. A more recent GWAS of age at menopause conducted by the 372 ReproGen consortium has a much larger sample size (n=201,323), (62) although more than half of the 373 sample comprise UK Biobank women, meaning a large sample overlap in the MR analysis. 374 Nonetheless, MR estimates using this more recent GWAS revealed similar results compared to the 375 previous smaller GWAS. ( Mechanisms underlying causal links 393 We show that an earlier AAM may lead to an earlier AFS and AFB, as well as an earlier AFS leading to 394 an earlier AFB. It is likely that earlier maturation may lead to earlier sexual activity, logically 395 . CC-BY 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 29, 2021. ; https://doi.org/10.1101/2021.09.29.21264251 doi: medRxiv preprint increasing the chance of an earlier pregnancy. In UK Biobank, a proportion of women may have had 396 first had sexual intercourse prior to the introduction of the NHS family planning act 1967 which 397 made contraception readily available through the NHS. This may have strengthened the effect of AFS 398 on AFB in this cohort and findings may not be generalisable to more contemporary studies. We also 399 show that an earlier AFS may lead to a higher number of sexual partners, which may occur due to a 400 longer amount of time to acquire partners if sexual activity commences earlier. Furthermore, we 401 identify that having a higher lifetime number of sexual partners may lead to a lower chance of 402 having children. This may be due to increased prevalence of short term relationships and regularly 403 changing sexual partners, (63) which, as a result, might lead to less chance of starting a family. 404 However, it is worth noting that after excluding outlying variants, the effect between lifetime 405 number of sexual partners and ever parous status attenuated. We present strong evidence for a 406 positive relationship between AFB and ALB. One explanation for this link could be as parents tend to 407 have children in a relatively short period of time, as shown in UK Biobank where the average AFB is 408 26 years, and ALB is 30 years for women. 409 The life history theory is another explanation as to why earlier AAM leads to earlier subsequent 410 reproductive events and a likelihood of an increased number of children. This theory distinguishes 411 the allocation of resources into growth and reproductive efforts and categorises "fast" or "slow" life 412 history strategies. (64,65) A "fast" life history strategy exerts more effort towards reproduction: earlier 413 puberty and sexual activity leading to an early AFB, and an increased number of births. (64,65) This is 414 corroborated by our finding that women who experience an earlier AFS, have children earlier and 415 have more children. If woman starts having children earlier, they have more opportunity to conceive 416 again before menopause, which may explain the effect we identify between an earlier AFB and 417 higher number of children. A "fast" life history may lead to an earlier age at menopause as allocating 418 resources towards reproductive efforts earlier in life and towards a higher number of children, which 419 may result in a completing reproduction at a younger age. 420 . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 29, 2021. ; https://doi.org/10.1101/2021.09.29.21264251 doi: medRxiv preprint analysis. Of note we did not find a causal effect of AAM on number of births and ever parous status. 422 Considering the life history theory, we might have expected to find an inverse effect, suggesting an 423 earlier AAM leads to a high number of births. 424 Furthermore, we did not find evidence of an effect of ever parous status on lifetime number of 425 sexual partners, number of births on age at last birth. We investigated bidirectional effects between 426 reproductive factors where there wasn't a clear temporal order and identified no bidirectional 427 effects. Specifically, no effects between age at menopause and age at last birth, lifetime number of 428 sexual partners, number of births and ever parous status, age at last birth and lifetime number of 429 sexual partners, and finally number of births and lifetime number of sexual partners. 430 Several relationships between reproductive factors separated by many years could be mediated by 431 other intervening reproductive events. For example, we identify effects between AAM and AFS, AFS 432 and AFB, and AAM and AFB, therefore the effect we find between AAM, and AFB may be mediated 433 by AFS. Similarly, we found effects between AFS and AFB, AFB and ALB, and AFS and ALB, which 434 could suggest that an earlier AFS leading to an earlier ALB may be mediated through an earlier AFB. 435 Future investigations could use mediation analyses to further elucidate these relationships. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 29, 2021. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 29, 2021. ; https://doi.org/10.1101/2021.09.29.21264251 doi: medRxiv preprint may not be generalisable to women in other ancestry groups. Future work is required to replicate 473 our findings in independent studies and translate the results in women in other ancestry groups. 474 While the majority of the reproductive factors are likely to be accurately captured through 475 questionnaire (such as AFB, number of births and ALB), other factors such as AAM may not be as 476 reliably recalled (74) . Self-report of lifetime number of sexual partners is also known to be 477 overestimated by some, which could explain the positively skewed distribution we identified. (75) To 478 account for this we performed rank-based inverse normal transformation of this variable. 479 The split-sample GWAS revealed little overlap between genome wide significant SNPs identified in 480 each sample. While some of these SNPs were identified slightly below the significance threshold 481 between samples, others appeared not to be associated. This suggest that some SNPs may have 482 been identified through spurious associations and may suggest evidence of winner's curse. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 29, 2021. ; https://doi.org/10.1101/2021.09.29.21264251 doi: medRxiv preprint . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 29, 2021. ; https://doi.org/10.1101/2021.09.29.21264251 doi: medRxiv preprint