Prioritization of putatively detrimental variants in euploid miscarriages ========================================================================= * Silvia Buonaiuto * Imma Di Biase * Valentina Aleotti * Amin Ravaei * Adriano De Marino * Gianluca Damaggio * Marco Chierici * Madhuri Pulijala * Palmira D’Ambrosio * Gabriella Esposito * Qasim Ayub * Cesare Furlanello * Pantaleo Greco * Antonio Capalbo * Michele Rubini * Sebastiano Di Biase * Vincenza Colonna ## Abstract **Study question** Can small genetic variants detected in the whole genome sequencing of spontaneously aborted euploid embryos give insight into possible causes of pregnancy loss? **Summary answer** By filtering and prioritizing genetic variants it is possible to identify genomic variants putatively responsible for miscarriage. **What is known already** Miscarriage is often caused to chromosomal aneuploidies of the gametes but it can also have other genetic causes like small mutations, both *de novo* or inherited from parents. The analysis of genomic sequences of miscarried embryos has mostly focused on rare variation, and been carried out using criteria and methods that are difficult to reproduce. The role of small mutations has been scantily investigated so far. **Study design, size, duration** This is a monocentric observational study. The study includes the data analysis of 46 embryos obtained from women experiencing pregnancy loss recruited by the University of Ferrara from 2017 to 2018. The study was approved by the Ethical committee of Emilia-Romagna (CE/FE 170475). **Participants/materials, setting, methods** The participants are forty-six women, mostly European (87%) diagnosed with first (n=25, av.age 32.7) or recurrent (n=21, av.age 36.5) miscarriage. Embryonic DNA was prepared form chorionic villi and used to select euploid embryos using quantitative PCR, comparative genomic hybridiztion and shallow sequencing of random genomic regions. Euploid embryos were whole-genome sequenced at 30X using Illumina short-reads technology and genomic sequences were used to identify genetic variants. Variants were annotated integrating information from Ensembl100 and literature knowledge on genes associated with embryonic development, miscarriages, lethality, cell cycle. Following annotation, variants were filtered to prioritize putatively detrimental variants in genes that are relevant for embryonic development using a pipeline that we developed. The code is available on gitHub (ezcn/grep). **Main results and the role of chance** Our pipeline prioritized 439 putatively causative single nucleotide polymorphisms among 11M variants discovered in ten embryos. By systematic investigation of all coding regions, 47 genes per embryo were selected. Among them *STAG2*, known in literature for its role in congenital and developmental disorders as well as in cancer, *TLE4* a key gene in embryonic development, expressed in both embryonic and extraembryonic tissues in the Wnt and Notch signalling pathways, and *FMNL2*, involved in cell motility with a major role in driving cell migration. Our analysis is fully reproducible (our code is open-source), and we take measures to increase its robustness to false positives by excluding genes with >5% chance to be selected in a control population. **Limitations, reasons for caution** This pilot study has major limitations in sample size and lack of integration of the parental genomic information. Despite being encouraging, the results need to be interpreted with caution as functional analyses are required to validate the hypotheses that have been generated. Although we have developed a robust and scalable methodology for prioritizing genetic variants, we have not yet extended it beyond the coding regions of the genome. **Wider implications of the findings** This pilot study demonstrate that analysis of genome sequencing can help to clarify the causes of idiopathic miscarriages and provides initial results from the analysis of ten euploid embryos, discovering plausible candidate genes and variants. This study provides guidance for a larger study. Results of this and following wider studies can be used to test genetic predisposition to miscarriages in parents that are planning to conceive or undergoing preimplantation genetic testing. In a wider context, the results of this study might be relevant for genetic counseling and risk management in miscarriages **Study funding/competing interest(s)** A.C. is a full time employee of Igenomix. A.D.M. was employee of Igenomix while working on this project. I.D.B., P.D.A., G.E., S.D.B. are full time employees of the MeriGen Research. All other authors declare that they have no conflicts of interest. Key words * miscarriage * embryos * whole-genome sequencing * variant prioritization * genetic of infertility * embryo aneuploidies * infertility diagnosis * *TLE4* * *STAG2* * *FMNL2* ## Introduction Miscarriage, the spontaneous termination of a pregnancy before 24 weeks of gestation, occurs in 10-15% of all pregnancies [Larsen et al., 2013, Ammon Avalos et al., 2012, Andersen et al., 2000] and has both environmental and genetic causes [Larsen et al., 2013]. Miscarriages are often the result of chromosomal aneuploidies of the gametes but they can also have genetic causes like small mutations (SNPs and indels), both de-novo or inherited from parents. Miscarriages are mostly studied using parental genetic information [Pereza et al., 2017, Quintero-Ronderos et al., 2017] and at a resolution that leaves the vast majority of the genome unexplored. Comparative genomic hybridization detects variants of several thousand base pairs [Robberecht et al., 2009, Kudesia et al., 2014, Mathur et al., 2014], while targeted resequencing investigates point mutations. Both are currently the most accurate methods for the genetic analysis of parental DNA of miscarriages but are not sensitive to small variants, or target only a few coding regions. Using a different approach, the only study so far that tests for genome-wide genetic association in a large cohort of miscarriages is also based only on maternal information [Laisk et al., 2020]. Depending on the mode of inheritance, the study of the parental genome might be ineffective as there is uncertainty about which parts of the parental genomes are actually inherited by the embryo, and it provides no way to identify *de novo* mutations. Therefore, extending the analysis to fetal genomes is the necessary next step to fully understand the genetics of miscarriages. DNA sequence information of miscarried fetuses has been already used to determine the genetic component of miscarriages [Rajcan-Separovic, 2020, Filges and Friedman, 2015]. Most studies adopt a family-based approach integrating pedigree and parental genomic data, often with focus on a reduced range of fetal phenotype [Bondeson et al., 2017, Dohrn et al., 2015, Wilbe et al., 2015, Cristofoli et al., 2017]. Very often the focus is on candidates genes, like the identification of a mutation in the X-linked gene *FOXP3* in siblings male miscarriages [Rae et al., 2015], and the identification of a truncating *TCTN3* mutations in unrelated embryos [Thomas et al., 2012]. A number of studies focus instead on exome sequences [Shamseldin et al., 2015,Qiao et al., 2016,Fu et al., 2018,Meier et al., 2019,Yates et al., 2017]. Among them, one study selects only variants transmitted to both sibling miscarriages [Qiao et al., 2016], others limit to autozygous variants [Thomas et al., 2012, Shamseldin et al., 2015], while some focus on delivering accurate diagnosis [Meier et al., 2019]. All these studies consider number of cases in the order of the tens and in most cases are motivated by phenotypic information mostly deriving from ultrasound scans. Two other studies adopt a cohort-based approach analyzing up to thousands of embryonic genomes with a range of phenotypes [Chen et al., 2017, Zhao et al., 2020a]. One of them focuses on finding causative variants, demonstrating that exome sequencing effectively informs genetic diagnosis in about one-third of the 102 cases considered [Zhao et al., 2020a]. The other one focuses on conserved genes in copy number variable (CNV) regions in 1810 cases to identify 275 genes, often in clusters, located in the CNVs and potentially implicated in essential embryonic developmental processes [Chen et al., 2017]. Because the number of embryos they analyze is too small for genetic association analysis to be effective, all studies mentioned so far perform sequencing followed by variant annotation and prioritization. All investigate apparently euploid embryos and focus on rare variation, but they use different criteria to select the variants and never release code to fully reproduce the variant prioritization. In this study, we analyzed whole-genome sequencing on euploid embryos from idiopathic spontaneous pregnancy losses (both first and recurrent) and developed GP a pipeline to prioritize putatively causative variants in coding regions. GP performs filtering of high-quality genomic variants based on prediction of the functional effect of the variants and using a set of parameters that can be specified by the user. This first selection is completed by filters for technical artifacts (e.g. mapping errors, read depth) and for false positives through resampling in a control cohort. Our pipeline can incorporate prior information on candidate genes, but is also robust to the discovery of novel genes. We prioritize on average 49 variants per embryo with high and moderate impact in genes relevant for embryonic development and mitochondrial metabolism, some of which were previously identified for having a role in miscarriages. We demonstrate that variant prioritization can be effective also when dealing with a limited number of samples and developed an approach that can be applied to a larger-scale project. Results from this study can be used to inform molecular diagnosis of pregnancy loss. ## Materials and methods ### Embryo data and samples collection The study protocol was examined by the Comitato Etico di Area Vasta Emilia Centro (CE-AVEC) of the Azienda Ospedaliero Universitaria di Bologna Policlinico S. Orsola-Malpighi. The committee gave the ethical approval of the study (reference CE/FE 170475). All participants provided written informed consents before entering the study. Cases were recruited at the Unit of Obstetrics and Gynecology of the Sant’Anna University Hospital in Ferrara, Italy, from 2017 to 2018. The inclusion criteria were: age between 18 and 42 years and gestational age up to 12 weeks. Exclusion criterion was any clinical condition that could prevent full-term pregnancies. Known causes of pregnancy losses were excluded by standard diagnostic protocol including hysteroscopy, laparoscopy, ultrasound, karyotype analysis, detection of immunological risk factors (anticardiolipin, lupus anticoagulant, antinuclear antibodies) and hormonal status (gonadotrophins, FSH, LH, prolactin, thyroid hormones, thyroperoxidase) before inclusion in the study. Gestational weeks were calculated from the last menstrual period. Demographic, antropometric and clinical data of cases, including obstetric history, family history of malformations, and periconceptional supplementation with folic acid, were anonymized and linked to biological samples by coding. ### DNA preparation and sequencing Retained product of conception was removed from uterus using a suction curette, and chorionic villi (CV) were carefully dissected from decidual tissue. We used dry homogenization after exploring a range of possibilities (Figure S1A). Genomic DNA was extracted from CV samples using QIAamp DNA Mini Kit (Ref: 51304, Qiagen) according to manufacturer’s protocol. This kit was chosen after considering the yield of two types of resin and one membrane (Figure S1B). DNA was titrated using Qubit 2.0 Fluorometer (Life Technologies). Whole-genome sequencing of the genomic DNA extracted from chorionic villi was done through a service provider (Macrogen). In particular, libraries for sequencing were prepared using the Illumina TruSeq DNA PCR-free Library (insert size 350bp) and samples were sequenced at 30X mapped (110Gb) 150bp PE on HiSeqX. ### Detection of chromosomal aneuploidies in embryos A rapid screening of sex and anuploidies for chromosomes 13, 15, 16, 18, 21, 22 and X was carried out on geomic DNA extracted from the chorionic villi performing five multiplex Quantitative Fluorescent PCR (QFPCR) assays. QFPCR assays were performed in a total volume of 25l containing 40–100ng of genomic DNA, 10mM dNTP (Roche), 6-30 pmol final concentration of each primer, 1×Fast taq polymerase buffer (15mmol/l MgCl 2) (Roche), and 2.5 U of Fasta taq polymerase (Roche). QFPCR conditions were as follows: denaturation at 95°C for 10 min followed by 10 cycles consisting of melting at 95°C for 1 min, annealing at 65°C (−1°C cycle) for 1 minutes, and then extension at 72°C for 40 seconds, then for 23 cycles at 95°C for 1 min, 55°C for 1 min, and 72°C for 1 min. Final extension was for 10min at 72°C and at 60°C for 60 min. Fluorescence-labelled QFPCR products were electrophoresed in an CEQ 8000 Backman by combining 40 l of Hi-Di Formamide and 0.5 l of DNA size standard 400 (Backman); QFPCR products were visualized and quantified as peak areas of each respective repeat lengths. In normal heterozygous subjects, the QFPCR product of each STR should show two peaks with similar fluorescent activities and thus a ratio of peak areas close to 1:1 (ranging from 0.8 to 1.4:1). A trisomy is suspected when the ratio is above or below this range (peak area ratios 0.6 and 1.8, trisomic diallelic pattern), otherwise there are three alleles of equal peak area with a ratio of 1:1:1 (trisomic triallelic). The presence of trisomic triallelic or diallelic patterns for at least two different STRs on the same chromosome is considered as evidence of trisomy. Trisomic patterns observed for all chromosome-specific STRs are indicative of triploidy. Therefore accurate X chromosome dosage, to perform diagnosis of X monosomy, can be assessed by TAF9L marker allowing This gene has a high degree of sequence identity between chromosome 3 and chromosome X; primers on this gene amplify a 3 b.p. deletion generating a chromosome X specific product of 141 b.p. and a chromosome 3 specific product of 144 b.p. Maternal contamination was also checked by QFPCR comparing the alleles found in miscarriages with those found in maternal blood. Comparative Genomic Hybridization was carried out using the Agilent SurePrint G3 Human CGH Microarray. Samples underwent DNA quantification and quality analysis prior to be labeled and hybridized on the microarray. Following hybridization samples were washed and the chip was scanned at 3 microns using the Agilent SureScan Microarray Scanner. The LogRatio from the arrays were segmented into regions of estimated equal copy number using both the method implemented in theAgilent CytoGenomics V3.0.4 software, and the Penalized least square implemented in the R package Copynumber (PLS, [Nilsen et al., 2012]). Classification as copy number of gains or losses (copy number variants) was done using as criteria at least five probes and Zscore <0.0016 (SD*4) [Vermeesch et al., 2005]. ### Statistical and sequence analyses Data cleaning, refining, and analysis (summary statistics, hypothesis testing) were performed using R [R Core Team, 2019]. Reads in the FASTQ file sequence data were aligned against the reference genome GRChg38.p12 using BWA [Li, 2013] and SAMTOOLS [Li, 2011]. Variant calling was done using FREEBAYES [Garrison and Marth, 2012]. The resulting VCF files were refined in further steps: VCFFILTER [Garrison, 2020] was used to filter variants for quality score>20, leaving only variants with estimated 99% probability of a polymorphic genotype call; VT [Tan et al., 2015] was used to normalize variants and deconstruct multiallelic variants. Refined VCF files were compressed and indexed using samtools [Li, 2011]. Variants were annotated for functional effects and allele frequency in other populations using Variant Effect Predictor [McLaren et al., 2016]. Phasing was done using Beagle 5.1 [Browning et al., 2018] under standard parameters. Principal component analysis was done with PLINK [Chang et al., 2015] using 1,2M autosomal SNPs. The GP pipeline for variant prioritization is written in Python and R and the code is publicly available ([https://github.com/ezcn/grep](https://github.com/ezcn/grep)). The manually curated list of genes associated with miscarriages (recurrent and spon-taneous) was obtained through a comprehensive search of the published literature. We considered seven studies highlighting the association of genes with miscarriages [Colley et al., 2019, Fu et al., 2018, Laisk et al., 2020, Pereza et al., 2017, Qiao et al., 2016, Quintero-Ronderos et al., 2017, Rull et al., 2012]. This compendium was further supplemented by genes from curated repositories such as Human Phenotype Ontology (HPO) [Robinson et al., 2008; URL: [https://hpo.jax.org/app/browse/term/HP:0200067](https://hpo.jax.org/app/browse/term/HP:0200067) last accessed: 1/12/2020 11:01:00 PM] and DisGeNET [Piñero et al., 2015; URL: [http://www.disgenet.org/search](http://www.disgenet.org/search) last accessed: 1/12/2020 11:12:00 PM]. The search terms used were “recurrent miscarriages”, “abortion”, “spontaneous abortion”, and “recurrent spontaneous abortion”. After filtering by removing the duplicates, combining the gene sets obtained from the literature and databases yielded a total of 608 unique genes (Supplementary Table 1). Additional information of genes such as HGNC symbol, HGNC ID, Gene Stable ID, Chromosomal coordinates (GRChg38), karyotype band, transcript count, protein stable ID were extracted from Ensembl Biomart [Kinsella et al., 2011]. Overrepresentation tests and protein classification were performed using the R package ReactomePA [Yu and He, 2016]. ## Results To understand genetic susceptibility to miscarriage we studied the genomes of forty-six spontaneously miscarried embryos. The embryos’ gestational age at pregnancy termination, calculated as the interval between the pregnancy termination date and the last menstruation date, ranges from 7.14 to 19.43 weeks (median is 10.3 weeks). Twenty-one embryos classifies as the product of recurrent miscarriages [Christiansen et al., 2018]. The mothers of the embryos are mostly of European origin (87%) and their median age at the date of collection was 36.7 5.9 years, with slightly significant higher age in recurrent cases compared to first ones (Figure 1, Mann–Whitney p-value=0.02). For the mothers of the embryos medical records report no major comorbidities. Folic acid was taken by 71% of the mothers with no difference between first and recurrent cases (Figure 1, Chi-square p-value=0.96). Median body mass index and menarche age are comparable between first and recurrent cases, as well as comparable to a group of control women (Figure 1). Altogether, from the available medical records, we suppose that the recruited mothers of the embryos were in the range of healthy adult individuals. ![Figure 1](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/01/08/2021.01.02.20248961/F1.medium.gif) [Figure 1](http://medrxiv.org/content/early/2021/01/08/2021.01.02.20248961/F1) Figure 1 Features of mothers of the embryos. **(A)** Median age of the mother at the event. There is no significant difference between first and recurrent miscarriages. **(B)** Gestational age at the time of the pregnancy termination. There is no significant difference between first and recurrent cases. **(C)** Folic acid intake. Range of values of menarche age **(D)** and Body Mass Index **(E)** in mothers of the embryos are not significantly different from a control set of mothers undergoing voluntary termination of pregnancy. It is known from literature that about half of the miscarriages in the first trimester are due to large chromosomal aneuploidies, such as trisomies or deletions of large chromosomal chunks [van den Berg et al., 2012]. In this study we want to focus on cases in which the genome is euploid, therefore the forty-six embryos were screened for chromosomal aneuploidies prior to whole-genome sequencing. We found that 32.6% of samples were euploid and could be sequenced while 56.6% of the embryos presented aneuploidies (Figure 2).The most common aneuploidy in our data set is the trisomy of chromosome 22 (26.9%), followed by trisomy of chromosome 16 (15.4%). In particular, a first round of detection of aneuploidies on chromosomes 13, 15, 16, 18, 21, 22, X, and Y through Short Tandem Repeats analysis discarded 45.7% of samples, and a subsequent analysis through comparative genomic hybridization and copy number variation detection form low-coverage sequencing discarded another 10.9% of the samples. Finally, a number of embryos (10.9%) dropped off the analysis due to low-quality DNA or maternal contamination. ![Figure 2](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/01/08/2021.01.02.20248961/F2.medium.gif) [Figure 2](http://medrxiv.org/content/early/2021/01/08/2021.01.02.20248961/F2) Figure 2 Outcome of the screening for aneuploidies in the embryos. Forty-six embryos were screened by quantitative PCR to determine aneuploidies of chromosomes 13, 15, 16, 18, 21, 22, X, and Y, as well as to determine maternal contamination. DNA from embryos with no anuploidies in these chromosomes were further analyzed by comparative hybridization and shallow sequencing. Overall, we found aneuploidies in 56.6% of the embryos, the most common being the trisomy of chromosome 22. In yellow the fraction of euploid embryos. After ascertaining euploidy, the whole genome of ten embryos was sequenced using Illumina short-reads at 30X coverage. In the set of embryos genomes, we identified 11M single-nucleotide polymorphisms (SNPs) and 2M small insertions or deletions (indels). ### Prioritization of genetic variants in coding genomic regions We developed the GP pipeline to prioritize putatively damaging genetic variants from sequencing data. GP takes as input genomic variants information from cases and controls (including the *per*-individual allelic counts) in form of a VCF file and outputs a table of variants prioritized according to user-defined parameters. GP uses functional annotations of genomic variants, information from publicly available sequence data of presumably healthy individuals, and, if available, knowledge of genes involved in the trait under study. GP currently analyzes coding regions and performs four filtering steps (Figure 3A). The first filter (Filter I, Figure 3B) retains variants based on: (i) an overall impact on the gene product classified as moderate or high [McLaren et al., 2016]: (ii) a user-defined threshold of allele frequency in control populations; (iii) the combined property of being putatively damaging (quantified by the CADD score [Rentzsch et al., 2019]) and located in genes intolerant to loss of function (determined by the pLI score [Karczewski et al., 2020]). In addition it is possible to incorporate one or more user-defined lists of genes relevant to the trait under study. Variants retained by Filter I (hits) are further filtered to control for false positives with Filters II and III. In particular Filter II removes variants in genes with too many hits, while Filter III determines the chance for genes to be selected in a control population based on criteria specified in Filter I. In practice, a number of control individuals are sampled a number of times and their genetic data filtered using Filter I to obtain a list of genes selected by chance (Figure 3C). Finally, Filter IV excludes private variants with read depth outside the range found in non-private ones. ![Figure 3](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/01/08/2021.01.02.20248961/F3.medium.gif) [Figure 3](http://medrxiv.org/content/early/2021/01/08/2021.01.02.20248961/F3) Figure 3 Overview of the pipeline for prioritization of the genetic variants. **(A)** GP takes as input information about genomic variants found in cases and controls and outputs a subset of variants prioritized according to user-defined parameters. GP currently analyzes coding regions and performs four filtering steps. **(B)** Filter I retains variants based on three criteria: overall impact on the gene product (moderate or high), allele frequency in control populations, and the combined property of being putatively damaging (quantified by the CADD score) and located in genes intolerant to loss of function (determined by the pLIscore). It is also possible to incorporate one or more user-defined lists of genes relevant to the trait under study. **(C)** Filter III determines the chance for genes to be selected in a control population based on criteria specified in Filter I. In practice, a number of control individuals are sampled a number of times and their genetic data filtered using Filter I to obtain a list of genes selected by chance. We applied the GP pipeline to data from the high-coverage whole-genome sequences of genomic DNA of the embryos. For Filter I we set allele frequency <0.05% in the 1000 Genomes [Consortium et al., 2015] and gnomAD [Karczewski et al., 2020] reference populations, while the functional effect of the variant within the gene context was taken into account in two ways: either selecting for putatively deleterious variants (CADD score >90th percentile) in genes highly intolerant to loss of function (pLI score >0.9), or selecting for variants in genes known to be involved in early embryonic development. In particular for this last option we included five lists of genes, namely genes involved in embryo development (Gene Ontology GO:0009790), genes lethal during embryonic stages [Dawes et al., 2019], essential for embryo development [Dawes et al., 2019], genes discovered through the Deciphering Developmental Disorders project [Study et al., 2015], and a manually curated list of candidate genes known to be involved in miscarriages. We requested the variant to satisfy one or both these criteria: (i) be in a gene present in at least two of the five lists or (ii) have CADD score above the 90th percentile and be in a gene with pLI>0.9. Overall, filter I retained 1,038 variants (hits) in embryos. Filter II removed variants in genes with >5 hits, under the assumption that variants found in these genes are likely to be sequencing and alignment artifacts. With few exceptions, we observed that the number of hits per gene at the 99th percentile was five, even if there is no significant correlation between number of hits and gene length (Spearman r2=0.05 p-value=0.124), and that hits in genes with >5 hits are enriched for private variants (Figure 4A). ![Figure 4](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/01/08/2021.01.02.20248961/F4.medium.gif) [Figure 4](http://medrxiv.org/content/early/2021/01/08/2021.01.02.20248961/F4) Figure 4 Features of the filtering steps. **(A)** Number of hits per gene after Filter II. The majority of genes have less than five hits and there is no significant correlation between number of hits and gene length (Spearman r2=0.05 p-value=0.124). In the insert: the genes with >5 hits are enriched for private variants. **(B)** Frequency across 100 replicates of genes that pass Filter I in resampling from a control population. Most genes are retained <5% of times (yellow) therefore are retained if found in the embryos. 1,531 genes are instead retained in >5% of replicates (blue) and therefore discarded if found in the embryos, under the assumption that they can be filtered by chance in healthy controls. **(C)** Despite comparable read depth between private and non-private variants after Filter III to control for possible artifacts due to low coverage, we further filter to remove hits that are private and with read depth outside the range found in non-private ones. For Filter III we used as control population 929 individuals from the Human Genome Diversity Project [Bergström et al., 2020] from which we resampled 100 times ten individuals after checking for population stratification (Figure S2). On each resampled set we performed Filter I analysis and recorded the genes that were retained. Overall 5,488 unique genes were retained in controls with different frequencies in samples across replicates (Figure 4B). When considering the 95th percentile, 1,531 genes are found >5% of times across replicates, therefore hits within these genes were removed by Filter III. Filter II and III retained 447 hits of which 21% are private with respect to 1000 Genomes and gnomAD data sets. Despite comparable read depth between private and non-private variants (Figure 4C, KS test p-value=0.99, F-test p-value=0.06), to control for possible artifacts due to scanty coverage, we applied a further filter that removes hits that are private and with read depth outside the range found in non-private ones. ### Properties and biological significance of the prioritized variants and genes After all filters, GP prioritizes 439 unique variants in 399 genes that code for 980 transcripts (Supplementary Table 2). Almost all the prioritized genes (n=378) have an OMIM accession number and 18.8% of them were not in the lists of candidate genes used by GP as input during the prioritization, demonstrating that GP is robust to detection of genes never investigated before in relationship to the phenotype under study. Nine genes are involved in the pathway of mitochondrial translation (Reactome identifier R-HSA-5368287) and this number represents a significant 4.9 fold enrichment over random expectations (Supplementary Table 3, p-value=1.45E-04, FDR=0.03). Similarly, we observe overrepresentation of genes involved in cell cycle checkpoints (R-HSA-69620) and signaling by Rho GTPases (R-HSA-194315). With reference to the cellular compartments where the gene product are expressed, we observe a 7.7 fold significant enrichment (p-value 7.82E-04, FDR=0.04) of protein expressed in the mitotic spindle pole or in associated complexes (Supplementary Table 4), among which the product of *STAG2* for which we observe an high-impact mutation in one embryo from this study. Finally, seven genes (*BHLHE40,DBN1, FOXA1, HSPD1, PLXNA3, SLC35A2, SRF*) were previously identified as essential genes in copy-number variable regions from the analysis of hundreds of miscarried fetuses [Chen et al., 2017] In the embryos, 4.1% of the prioritized variants are stop gains/loss, frameshift indels, and variants that disrupt splicing sites, all classified as having high impact on the gene products, while missense mutations prevail among the variant with moderate effect (Figure 5A, Table S1). Averages per embryos are 48.9 8.0 genomic variants in 47.8 7.7 genes coding for 113.5 24.6 transcripts (Figure 5B). In almost all prioritized genes, GP retains only one variant per embryo, with few exception (five cases with two and one with three variants per gene), as shown in Figure 5B, where the allele dosage and impact are also shown. ![Figure 5](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/01/08/2021.01.02.20248961/F5.medium.gif) [Figure 5](http://medrxiv.org/content/early/2021/01/08/2021.01.02.20248961/F5) Figure 5 Results of the prioritization pipeline. **(A)** Number of variants per embryo stratified by impact. Overall 4.1% of the prioritized variants are classified as having high impact on the gene products. **(B)** Results per embryo. On the y-axis we give the prioritized genes while the x-axis shows the number of mutations per gene. Colors indicate the allele count and the class of severity. **(C)** Selection of prioritized variants/genes shared by embryos. ### Mutations in *STAG2, FLAD1, TLE4, FRMPD3*, and *FMNL2* in the embryos The male embryo FE130 carries two high-impact mutations in single copy. The first is a one extremely rare T>G transversion (rs913664484, G frequency is 4.7e-05 in 42.7k individuals from gnomeAD) at the 5’ end of the first intron of the Stromal antigen 2 (*STAG2*) gene. The mutation disrupts a splicing site, therefore having an high impact on the gene product. *STAG2* is located on the X chromosome and its inactivation is the cause of severe congenital and developmental defects in embryos and infants [Mullegama et al., 2017, Mullegama et al., 2019, Aoi et al., 2020, Study et al., 2015] as well as chromosomal aneuploidies in several types of human cancers [Solomon et al., 2011]. Interestingly, only mildly-deleterious mutations have been found in alive human males, while females can carry highly deleterious mutations in heterozygosis [Mullegama et al., 2019]. *STAG2* codes for the cohesin subunit SA-2 [Cuadrado and Losada, 2020]. Cohesins are ring-shaped protein complexes that bring into close proximity two different DNA molecules or two distant parts of the same DNA molecule and are responsible for the cohesion of sister chromatids [McNicoll et al., 2013]. In mouse, inactivation *Stag2* causes early embryo lethality [De Koninck et al., 2020]. The second high-impact mutation of FE130 is a stop gain in the Flavin Adenine Dinucleotide Synthetase 1 (*FLAD1*) gene that is expressed in the mitochondrial DNA where it catalyzes the adenylation of flavin mononucleotide (FMN) to form flavin adenine dinucleotide (FAD) coenzyme [Brizio et al., 2006]. The FAD synthase is an essential protein as the products of its activity, the flavocoenzymes play a vital role in many metabolic processes and in fact FAD synthase deficiencies (OMIM 255100) associated with homozygous severe mutations cause death in the first months of life [Balasubramaniam et al., 2019]. In FE130 the stop mutation p.Q159* affects one of the five isoforms (Uniprot identifier Q8NFF5-5) at the second last residue, therefore we can speculate that it might not seriously compromise the function of the protein. The embryo FE136 carries an heterozygous missense mutation (rs41307447) in the Transducin-like enhancer protein 4 gene (*TLE4*, synonym *GRG-4*) that causes a substitution of a polar amino acid with another polar amino acid (Ser>Tre) in the seventh exon of the gene, corresponding to a low complexity domain of the protein. The rs41307447 polymorphism is tolerated (SIFT score 0.18) and supposed to be benign (PolyPhen score 0.003), nevertheless the *TLE4* gene is classified as highly intolerant to loss of function (pLI score 0.999) and the CADD score associated to rs41307447 is in the 99.8th percentile. *TLE4* is a trascriptional repressor of the Groucho-family expressed in the embryonic stem cells where it represses naive pluripotency gene [Laing et al., 2015] and it is a direct transcriptional target of Notch [Menchero et al., 2019]. *TLE4* is also expressed in the extravillous trophoblasts [Meinhardt et al., 2014] where it is part of the Wnt signaling pathway that promotes implantation, trophoblast invasion, and endometrial function [Sonderegger et al., 2010]. Finally, a study in a cohort of 750 women finds significant association between the A allele of rs7859844 on chromosome 9 and recurrent miscarriages, further showing that rs7859844 physically interact with *TLE4* [Laisk et al., 2020]. *In our study among all embryos only FE106 carries the intergenic variant rs7859844*. *Among prioritized variants shared by more than one embryo, the male FE165 and female FE106 embryos share a stop gain mutation (p*.*Q1758*) in the X-linked FERM and PDZ domain containing 3 (FRMPD3*) gene, which is highly intolerant to loss of function (pLI = 0.91). The mutation falls at the protein C-terminal in a polyQ stretch (27 residues). While little is known in humans about this gene, a study in lion head goose finds significant association between high expression of *FRMPD3* and low production of eggs [Zhao et al., 2020b]. Five embryos, among which the carrier of the missense mutation in *TLE4*, share one copy of an haplotype composed of two T alleles 4bp apart causing stop-gain (rs750755379) and missense (rs866373641) substitutions in the Formin-like protein 2 gene (*FMNL2*, Figure 5C). The two alleles exist at moderate-to-high frequency in human populations (Figure 6A) and are in perfect linkage disequilibrium (r2=1) in the embryos. In addition to the two mutations described above, the embryo FE165 has a deleterious and probably damaging missense mutation in phase with the two others (rs189416564, SIFT=0, PolyPhen =0.969). *FMNL2* codes for a formin-related protein expressed in multiple human tissues and in particular in gastrointestinal and mammary epithelia, lymphatic tissues, placenta, and in the reproductive tract [Gardberg et al., 2010]. In the fetus *FMNL2* is expressed in the cytoplasm of brain, spinal cord, and rectum [Lizio et al., 2015]. *FMNL2* is an elongation factor of actin filaments that drives cell migration by increasing the efficiency of lamellipodia protrusion [Block et al., 2012, Kühn et al., 2015], and its overexpression is associated with cancer [Zhu et al., 2011]. The stop-gain mutation we find in the five embryos is located in the first domain of the protein, a Rho GTPase-binding/formin homology 3 (GBD/FH3) domain involved in subcellular localization and regulation of activation (Figure 6B). The stop codon produces a truncated protein that lacks the Formin Homology-2 (FH2) domain, which directly binds to the actin filament catalyzing its nucleation and elongation. ![Figure 6](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/01/08/2021.01.02.20248961/F6.medium.gif) [Figure 6](http://medrxiv.org/content/early/2021/01/08/2021.01.02.20248961/F6) Figure 6 Two-SNP haplotype in *FMNL2* prioritized in five embryos. **(A)** Alelle frequencies from the NCBI Alpha database of the alternate alleles at rs750755379 and rs866373641. The two alleles exist at moderate-to-high frequency in human populations. **(B)** Position in the protein of rs750755379 (stop gain) and rs866373641 (missense). The stop-gain mutation is located in the Rho GTPase-binding/formin homology 3 (GBD/FH3) domain involved in subcellular localization and regulation of activation. The resulting truncated protein lacks the Formin Homology-2 (FH2) domain, which directly binds to the actin filament catalyzing its nucleation and elongation. ## Discussion Miscarriages are frequent events with a complex aetiology whose genetic components have not been completely understood. We developed a scalable pipeline that investigates small genetic variation which has rarely been considered in the context of miscarriages. We use our pipeline to analyze coding regions of the genome of ten miscarried euploid embryos to prioritize putatively detrimental variants in genes that are relevant for embryonic development. Our pipeline prioritized 439 putatively causative single nucleotide polymorphisms among 11M variants discovered in the ten embryos. Through systematic investigation of all coding regions GP selected about 47 genes per embryo and by manual curation of the selected genes we highlight a few cases. Among them, we find three examples relevant to embryonic development. An hemizygous splice site mutation in one male embryo on *STAG2*, known in literature for its role in congenital and developmental disorders as well as in cancer. An missense mutation is *TLE4*, a gene that interacts with the genomic region on chromosome 9 genetically associated with miscarriages in a genome-wide study on mothers. *TLE4* appears to be a key gene in embryonic development, as it is expressed in both embryonic and extraembryonic tissues where it participates in the Wnt and Notch signalling pathways. Finally, a 4-bp haplotype in five embryos, containing a stop gain and a missense mutations in *FMNL2*, a gene involved in cell motility with a major role in driving cell migration. The stop gain mutation truncates the protein well before the main functional domain of FMNL2, i.e. the domain that binds the actin filaments, therefore causing a complete loss of function of the protein product. In this study we focus on single nucleotide variants. GP combines functional information on variants and genes with population genomics and literature information to sift millions of variants in search for the relevant ones. This approach closes a gap as genetic analyses of miscarriages mostly focused on detecting chromosomal aneuploidies and large chromosomal aberrations (which explain less than half of the cases) leaving unexplored small size genetic variants, the most abundant type of genomic variation. To some extent small genetic variants have been considered in a number of cases that performed target resequencing of candidate genes, a valid but still not systematic approach because it does not fully exploit genomic information as instead GP does. GP ‘s units of analysis are transcripts and genes with no prior hypothesis on genes, but at the same time GP includes literature knowledge on miscarriages via the use of gene lists in Filter I. As a result, our approach is robust to both discovery of novel association and investigation of genes with known association to miscarriages, overcoming the major limitation of candidate genes studies. Variant prioritization is done at an individual level. While we expect that the same gene might be the cause of multiple miscarriages, given our limited sample size we do not expect that the same exact mutation to cause the gene’s loss of function. Therefore, by filtering at the individual level GP accounts for inter-individual variation, i.e. the larger fraction of genomic variability, as well as for the high chance of occurrence of *de novo* mutations. Nevertheless, in five embryos GP selected the exact same combination of two linked alelles in *FMNL2*, showing that while it is individual-based GP is still capable of finding variants shared by more than one case. Our pipeline is reproducible and easy to scale to larger studies and different phenotypes. To improve robustness, it includes a control population to filter out genes that can be prioritized by chance. GP is suitable for cases where it is not possible to rely on an adequate number of samples to perform association analysis. The future integration of genomic information on parents (not available in this collection) will allow us to infer inheritance mechanisms and distinguish between *de novo* and recessive mutations, with implications for clinical applications in the case of causative recessive mutations in the parents. Collecting genomic information from larger families, with several miscarriages/live births from the same couple will also further increase the strength of mendelian segregation analysis and the true discovery rate. In conclusion, this exploratory study demonstrates that filtering and prioritizing is effective in identifying genomic variants putatively responsible for miscarriages and provides indications and tools for developing a larger study. Compared to previous similar studies our work focuses on a systematic exploration of the genome that combines previous knowledge with hypothesis-free prioritization, making it robust not only to the discovery of mutations in genes known to be associated with miscarriage, but also in the identification of novel genes. Our findings have wide clinical implications. While only providing a proof of concept study, have already produced information about genes that can be used to test genetic predisposition to miscarriages in parents that are planning to conceive or particularly for recurrent miscarriage patients. In a wider context, the results of this study might be relevant for genetic counseling and risk management in miscarriages. Future development will include the extension of the analysis to non-coding regions and to structural variants, as well as the enrollment of trios to fully exploit parental information. ## Supporting information Supplementary Table 1 [[supplements/248961_file11.xlsx]](pending:yes) Supplementary Table 2 [[supplements/248961_file12.xlsx]](pending:yes) Supplementary Table 3 [[supplements/248961_file13.xlsx]](pending:yes) Supplementary Table 4 [[supplements/248961_file14.xlsx]](pending:yes) ## Data Availability Data is available through collaboration ## Author’s roles Q.A., A.C., S.D.B., and V.C. conceived and designed the study. S.B., A.D.M., G.D.M., M.P., and V.C. wrote the code and performed the bioinformatics analyses. M.C. and C.F. provided support to bioinformatics analyses. V.A., A.R., I.D.B., P.D.A., and G.E. performed the experiments. P.G., and M.R. contributed to the clinical samples and collected clinical data. S.B. and V.C. wrote the manuscript. All authors critically reviewed the manuscript and approved the final version. ## Funding P.O.R. Campania FSE 2014-2020 and EMBO STF 7919 to V.C. The computational work has been executed on the IT resources of the ReCaS-Bari data center, which have been made available by two projects financed by the MIUR (Italian Ministry for Education, University and Re-search) in the *PON Ricerca e Competitivita’ 2007-2013” Program: ReCaS (Azione I Interventi di rafforzamento strutturale, PONa3 00052, Avviso 254/Ric) and PRISMA (Asse II Sostegno all’innovazione, PON04a2 A)* ## Conflict of interest A.C. is a full time employee of Igenomix. A.D.M. was employee of Igenomix while working on this project. I.D.B., P.D.A., G.E., S.D.B. are full time employees of the MeriGen Research. All other authors declare that they have no conflicts of interest. ## Figure legends ![Supplementary Figure 1](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/01/08/2021.01.02.20248961/F7.medium.gif) [Supplementary Figure 1](http://medrxiv.org/content/early/2021/01/08/2021.01.02.20248961/F7) Supplementary Figure 1 Optimization of tissue homogenization and DNA extraction. We do not observe significant difference between two methods of tissue homogenization (**A**), and three methods of DNA isolation (**b**) apart form a slightly higher range of yield for one type of resin. VTP: voluntary pregnancy termination; FPL: first pregnancy loss; RPL:recurrent pregnancy loss; MAT: maternal bllod; PoC: product of conception. ![Supplementary Figure 2](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/01/08/2021.01.02.20248961/F8.medium.gif) [Supplementary Figure 2](http://medrxiv.org/content/early/2021/01/08/2021.01.02.20248961/F8) Supplementary Figure 2 Principal Component Analysis. Plot of the first and second components obtained using 1,2M autosomal SNPs and considering genomic data of the ten embryos sequenced in this study and publicly available data of individuals form the Human Genome Diversity Project ## Acknowledgements We are thankful to all volunteers that were enrolled in the study, as well as all the medical personnel that contributed to the collection of samples. We are also thankful to Prof. Nicole Soranzo and Prof. Erik Garrison for help in the early stage of the project. * Received January 2, 2021. * Revision received January 2, 2021. * Accepted January 8, 2021. * © 2021, Posted by Cold Spring Harbor Laboratory This pre-print is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), CC BY-NC 4.0, as described at [http://creativecommons.org/licenses/by-nc/4.0/](http://creativecommons.org/licenses/by-nc/4.0/) ## References 1. Ammon Avalos L, Galindo C, and Li DK. A systematic review to calculate background miscarriage rates using life table analysis. Birth Defects Research Part A: Clinical and Molecular Teratology 2012; 94:417–423. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/bdra.23014&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22511535&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F01%2F08%2F2021.01.02.20248961.atom) 2. Andersen AMN, Wohlfahrt J, Christens P, Olsen J, and Melbye M. Maternal age and fetal loss: population based register linkage study. Bmj 2000; 320:1708–1712. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiYm1qIjtzOjU6InJlc2lkIjtzOjEzOiIzMjAvNzI1MS8xNzA4IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjEvMDEvMDgvMjAyMS4wMS4wMi4yMDI0ODk2MS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 3. Aoi H, Lei M, Mizuguchi T, Nishioka N, Goto T, Miyama S, Suzuki T, Iwama K, Uchiyama Y, Mitsuhashi S et al. Nonsense variants of stag2 result in distinct congenital anomalies. Human genome variation 2020; 7:1–7. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41439-020-0102-6&link_type=DOI) 4. Balasubramaniam S, Christodoulou J, and Rahman S. Disorders of riboflavin metabolism. Journal of inherited metabolic disease 2019; 42:608–619. 5. Bergström A, McCarthy SA, Hui R, Almarri MA, Ayub Q, Danecek P, Chen Y, Felkel S, Hallast P, Kamm J et al. Insights into human genetic variation and population history from 929 diverse genomes. Science 2020; 367. 6. Block J, Breitsprecher D, Kühn S, Winterhoff M, Kage F, Geffers R, Duwe P, Rohn JL, Baum B, Brakebusch C et al. Fmnl2 drives actin-based protrusion and migration downstream of cdc42. Current Biology 2012; 22:1005–1012. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.cub.2012.03.064&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22608513&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F01%2F08%2F2021.01.02.20248961.atom) 7. Bondeson ML, Ericson K, Gudmundsson S, Ameur A, Pontén F, Wesström J, Frykholm C, and Wilbe M. A nonsense mutation in cep55 defines a new locus for a meckel-like syndrome, an autosomal recessive lethal fetal ciliopathy. Clinical genetics 2017; 92:510–516. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/cge.13012&link_type=DOI) 8. Brizio C, Galluccio M, Wait R, Torchetti EM, Bafunno V, Accardi R, Gianazza E, Indiveri C, and Barile M. Over-expression in escherichia coli and characterization of two recombinant isoforms of human fad synthetase. Biochemical and biophysical research communications 2006; 344:1008–1016. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=16643857&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F01%2F08%2F2021.01.02.20248961.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000237585000044&link_type=ISI) 9. Browning BL, Zhou Y, and Browning SR. A one-penny imputed genome from next-generation reference panels. The American Journal of Human Genetics 2018; 103:338–348. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ajhg.2018.07.015&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=30100085&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F01%2F08%2F2021.01.02.20248961.atom) 10. Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, and Lee JJ. Second-generation plink: rising to the challenge of larger and richer datasets. Gigascience 2015; 4:s13742–015. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/s13742-015-0047-8&link_type=DOI) 11. Chen Y, Bartanus J, Liang D, Zhu H, Breman AM, Smith JL, Wang H, Ren Z, Patel A, Stankiewicz P et al. Characterization of chromosomal abnormalities in pregnancy losses reveals critical genes and loci for human early development. Human mutation 2017; 38:669–677. 12. Christiansen OB, Elson J, Kolte AM, Lewis S, Middeldorp S, Nelen W, Peramo B, Quenby S, Vermeulen N, Goddijn M et al. Eshre guideline: recurrent pregnancy loss. Human Reproduction Open 2018; 2018:hoy004–hoy004. 13. Colley E, Hamilton S, Smith P, Morgan NV, Coomarasamy A, and Allen S. Potential genetic causes of miscarriage in euploid pregnancies: a systematic review. Human Reproduction Update 2019; 25:452–472. 14. Consortium GP et al. A global reference for human genetic variation. Nature 2015; 526:68–74. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nature15393&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26432245&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F01%2F08%2F2021.01.02.20248961.atom) 15. Cristofoli F, De Keersmaecker B, De Catte L, Vermeesch JR, and Van Esch H. Novel stil compound heterozygous mutations cause severe fetal microcephaly and centriolar lengthening. Molecular syndromology 2017; 8:282–293. 16. Cuadrado A and Losada A. Specialized functions of cohesins stag1 and stag2 in 3d genome architecture. Current Opinion in Genetics & Development 2020; 61:9–16. 17. Dawes R, Lek M, and Cooper ST. Gene discovery informatics toolkit defines candidate genes for unexplained infertility and prenatal or infantile mortality. NPJ genomic medicine 2019; 4:1–11. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41525-019-0098-3&link_type=DOI) 18. De Koninck M, Lapi E, Badía-Careaga C, Cossío I, Giménez-Llorente D, Rodríguez-Corsino M, Andrada E, Hidalgo A, Manzanares M, Real FX et al. Essential roles of cohesin stag2 in mouse embryonic development and adult tissue homeostasis. Cell Reports 2020; 32:108014. 19. Dohrn N, Le V, Petersen A, Skovbo P, Pedersen I, Ernst A, Krarup H, and Petersen MB. Ecel1 mutation causes fetal arthrogryposis multiplex congenita. American Journal of Medical Genetics Part A 2015; 167:731–743. 20. Filges I and Friedman JM. Exome sequencing for gene discovery in lethal fetal disorders–harnessing the value of extreme phenotypes. Prenatal diagnosis 2015; 35:1005–1009. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F01%2F08%2F2021.01.02.20248961.atom) 21. Fu M, Mu S, Wen C, Jiang S, Li L, Meng Y, and Peng H. Whole-exome sequencing analysis of products of conception identifies novel mutations associated with missed abortion. Molecular medicine reports 2018; 18:2027–2032. 22. Gardberg M, Talvinen K, Kaipio K, Iljin K, Kampf C, Uhlen M, and Carpén O. Characterization of diaphanous-related formin fmnl2 in human tissues. BMC Cell Biology 2010; 11:55. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/1471-2121-11-55&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20633255&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F01%2F08%2F2021.01.02.20248961.atom) 23. Garrison E. vcflib. [https://github.com/vcflib/vcflib](https://github.com/vcflib/vcflib) 2020. 24. Garrison E and Marth G. Haplotype-based variant detection from short-read sequencing. M arXiv preprint arxiv:12073907 2012;. 25. Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alfö ldi J, Wang Q, Collins RL, Laricchia KM, Ganna A, Birnbaum DP et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 2020; 581:434–443. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41586-020-2308-7&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32461654&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F01%2F08%2F2021.01.02.20248961.atom) 26. Kinsella RJ, Kähäri A, Haider S, Zamora J, Proctor G, Spudich G, Almeida-King J, Staines D, Derwent P, Kerhornou A et al. Ensembl biomarts: a hub for data retrieval across taxonomic space. Database 2011; 2011. 27. Kudesia R, Li M, Smith J, Patel A, and Williams Z. Rescue karyotyping: a case series of array-based comparative genomic hybridization evaluation of archival conceptual tissue. Reproductive Biology and Endocrinology 2014; 12:19. 28. Kühn S, Erdmann C, Kage F, Block J, Schwenkmezger L, Steffen A, Rottner K, and Geyer M. The structure of fmnl2–cdc42 yields insights into the mechanism of lamellipodia and filopodia formation. Nature communications 2015; 6:1–14. 29. Laing AF, Lowell S, and Brickman JM. Gro/tle enables embryonic stem cell differentiation by repressing pluripotent gene expression. Developmental biology 2015; 397:56–66. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ydbio.2014.10.007&link_type=DOI) 30. Laisk T, Soares ALG, Ferreira T, Painter JN, Censin JC, Laber S, Bacelis J, Chen CY, Lepamets M, Lin K et al. The genetic architecture of sporadic and multiple consecutive miscarriage. Nature communications 2020; 11:1–12. 31. Larsen EC, Christiansen OB, Kolte AM, and Macklon N. New insights into mechanisms behind miscarriage. BMC medicine 2013; 11:154. 32. Li H. A statistical framework for snp calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 2011; 27:2987–2993. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/btr509&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21903627&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F01%2F08%2F2021.01.02.20248961.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000296099300009&link_type=ISI) 33. Li H. Aligning sequence reads, clone sequences and assembly contigs with bwa-mem. arXiv preprint arxiv:13033997 2013;. 34. Lizio M, Harshbarger J, Shimoji H, Severin J, Kasukawa T, Sahin S, Abugessaisa I, Fukuda S, Hori F, Ishikawa-Kato S et al. Gateways to the fantom5 promoter level mammalian expression atlas. Genome biology 2015; 16:22. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/s13059-014-0560-6&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25723102&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F01%2F08%2F2021.01.02.20248961.atom) 35. Mathur N, Triplett L, and Stephenson MD. Miscarriage chromosome testing: utility of comparative genomic hybridization with reflex microsatellite analysis in preserved miscarriage tissue. Fertility and sterility 2014; 101:1349–1352. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24636399&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F01%2F08%2F2021.01.02.20248961.atom) 36. McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GR, Thormann A, Flicek P, and Cunningham F. The ensembl variant effect predictor. Genome biology 2016; 17:122. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/s13059-016-0974-4&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27268795&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F01%2F08%2F2021.01.02.20248961.atom) 37. McNicoll F, Stevense M, and Jessberger R. Cohesin in gametogenesis. In Current topics in developmental biology, Elsevier, volume 102 2013; pp. 1–34. 38. Meier N, Bruder E, Lapaire O, Hoesli I, Kang A, Hench J, Hoeller S, De Geyter J, Miny P, Heinimann K et al. Exome sequencing of fetal anomaly syndromes: novel phenotype–genotype discoveries. European Journal of Human Genetics 2019; 27:730–737. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F01%2F08%2F2021.01.02.20248961.atom) 39. Meinhardt G, Haider S, Haslinger P, Proestling K, Fiala C, Pollheimer J, and Knöfler M. Wntdependent t-cell factor-4 controls human etravillous trophoblast motility. Endocrinology 2014; 155:1908–1920. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1210/en.2013-2042&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24605829&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F01%2F08%2F2021.01.02.20248961.atom) 40. Menchero S, Rollan I, Lopez-Izquierdo A, Andreu MJ, De Aja JS, Kang M, Adan J, Benedito R, Rayon T, Hadjantonakis AK et al. Transitions in cell potency during early mouse development are driven by notch. Elife 2019; 8:e42930. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.7554/eLife.42930&link_type=DOI) 41. Mullegama SV, Klein SD, Mulatinho MV, Senaratne TN, Singh K, Center UCG, Nguyen DC, Gallant NM, Strom SP, Ghahremani S et al. De novo loss-of-function variants in stag2 are associated with developmental delay, microcephaly, and congenital anomalies. American journal of medical genetics Part A 2017; 173:1319–1327. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/ajmg.a.38207&link_type=DOI) 42. Mullegama SV, Klein SD, Signer RH, Center UCG, Vilain E, and Martinez-Agosto JA. Mutations in stag2 cause an x-linked cohesinopathy associated with undergrowth, developmental delay, and dysmorphia: Expanding the phenotype in males. Molecular genetics & genomic medicine 2019; 7:e00501. 43. Nilsen G, Liestøl K, Van Loo P, Vollan HKM, Eide MB, Rueda OM, Chin SF, Russell R, Baumbusch LO, Caldas C et al. Copynumber: efficient algorithms for single-and multi-track copy number segmentation. BMC genomics 2012; 13:591. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/1471-2164-13-591&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23442169&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F01%2F08%2F2021.01.02.20248961.atom) 44. Pereza N, Ostojić S, Kapović M, and Peterlin B. Systematic review and meta-analysis of genetic association studies in idiopathic recurrent spontaneous abortion. Fertility and sterility 2017; 107:150–159. 45. Qiao Y, Wen J, Tang F, Martell S, Shomer N, Leung PC, Stephenson MD, and Rajcan-Separovic E. Whole exome sequencing in recurrent early pregnancy loss. Mhr: Basic science of reproductive medicine 2016; 22:364–372. 46. Quintero-Ronderos P, Mercier E, Fukuda M, González R, Suárez CF, Patarroyo MA, Vaiman D, Gris JC, and Laissue P. Novel genes and mutations in patients affected by recurrent pregnancy loss. PLoS One 2017; 12:e0186149. 47. R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria 2019. 48. Rae W, Gao Y, Bunyan D, Holden S, Gilmour K, Patel S, Wellesley D, and Williams A. A novel foxp3 mutation causing fetal akinesia and recurrent male miscarriages. Clinical Immunology 2015; 161:284–285. 49. Rajcan-Separovic E. Next generation sequencing in recurrent pregnancy loss-approaches and outcomes. European Journal of Medical Genetics 2020; 63:103644. 50. Rentzsch P, Witten D, Cooper GM, Shendure J, and Kircher M. Cadd: predicting the deleteriousness of variants throughout the human genome. Nucleic acids research 2019; 47:D886–D894. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F01%2F08%2F2021.01.02.20248961.atom) 51. Robberecht C, Schuddinck V, Fryns JP, and Vermeesch JR. Diagnosis of miscarriages by molecular karyotyping: benefits and pitfalls. Genetics in Medicine 2009; 11:646. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1097/GIM.0b013e3181abc92a&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=19617844&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F01%2F08%2F2021.01.02.20248961.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000270526200009&link_type=ISI) 52. Rull K, Nagirnaja L, and Laan M. Genetics of recurrent miscarriage: challenges, current knowledge, future directions. Frontiers in genetics 2012; 3:34. 53. Shamseldin HE, Tulbah M, Kurdi W, Nemer M, Alsahan N, Al Mardawi E, Khalifa O, Hashem A, Kurdi A, Babay Z et al. Identification of embryonic lethal genes in humans by autozygosity mapping and exome sequencing in consanguineous families. Genome biology 2015; 16:116. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/s13059-015-0681-6&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26036949&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F01%2F08%2F2021.01.02.20248961.atom) 54. Solomon DA, Kim T, Diaz-Martinez LA, Fair J, Elkahloun AG, Harris BT, Toretsky JA, Rosenberg SA, Shukla N, Ladanyi M et al. Mutational inactivation of stag2 causes aneuploidy in human cancer. Science 2011; 333:1039–1043. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEzOiIzMzMvNjA0NS8xMDM5IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjEvMDEvMDgvMjAyMS4wMS4wMi4yMDI0ODk2MS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 55. Sonderegger S, Pollheimer J, and Knöfler M. Wnt signalling in implantation, decidualisation and placental differentiation–review. Placenta 2010; 31:839–847. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.placenta.2010.07.011&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20716463&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F01%2F08%2F2021.01.02.20248961.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000283408300001&link_type=ISI) 56. Study TDDD, Fitzgerald T, Gerety S, Jones W, van Kogelenberg M, King D, McRae J, Morley K, Parthiban V, Al-Turki S et al. Large-scale discovery of novel genetic causes of developmental disorders. Nature 2015; 519:223–228. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nature14135&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25533962&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F01%2F08%2F2021.01.02.20248961.atom) 57. Tan A, Abecasis GR, and Kang HM. Unified representation of genetic variants. Bioinformatics 2015; 31:2202–2204. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/btv112&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25701572&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F01%2F08%2F2021.01.02.20248961.atom) 58. Thomas S, Legendre M, Saunier S, Bessiéres B, Alby C, Bonniére M, Toutain A, Loeuillet L, Szymanska K, Jossic F et al. Tctn3 mutations cause mohr-majewski syndrome. The American Journal of Human Genetics 2012; 91:372–378. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ajhg.2012.06.017&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22883145&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F01%2F08%2F2021.01.02.20248961.atom) 59. van den Berg MM, van Maarle MC, van Wely M, and Goddijn M. Genetics of early miscarriage. Biochim Biophys Acta 2012; 1822:1951–1959. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.bbadis.2012.07.001&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22796359&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F01%2F08%2F2021.01.02.20248961.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000311191400012&link_type=ISI) 60. Vermeesch JR, Melotte C, Froyen G, Van Vooren S, Dutta B, Maas N, Vermeulen S, Menten B, Speleman F, De Moor B et al. Molecular karyotyping: array cgh quality criteria for constitutional genetic diagnosis. Journal of Histochemistry & Cytochemistry 2005; 53:413–422. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1369/jhc.4A6436.2005&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=15750031&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F01%2F08%2F2021.01.02.20248961.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000227580100036&link_type=ISI) 61. Wilbe M, Ekvall S, Eurenius K, Ericson K, Casar-Borota O, Klar J, Dahl N, Ameur A, Annerén G, and Bondeson ML. Musk: a new target for lethal fetal akinesia deformation sequence (fads). Journal of medical genetics 2015; 52:195–202. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6OToiam1lZGdlbmV0IjtzOjU6InJlc2lkIjtzOjg6IjUyLzMvMTk1IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjEvMDEvMDgvMjAyMS4wMS4wMi4yMDI0ODk2MS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 62. Yates CL, Monaghan KG, Copenheaver D, Retterer K, Scuffins J, Kucera CR, Friedman B, Richard G, and Juusola J. Whole-exome sequencing on deceased fetuses with ultrasound anomalies: expanding our knowledge of genetic disease during fetal development. Genetics in Medicine 2017; 19:1171–1178. 63. Yu G and He QY. Reactomepa: an r/bioconductor package for reactome pathway analysis and visualization. Molecular BioSystems 2016; 12:477–479. 64. Zhao C, Chai H, Zhou Q, Wen J, Reddy UM, Kastury R, Jiang Y, Mak W, Bale AE, Zhang H et al. Exome sequencing analysis on products of conception: a cohort study to evaluate clinical utility and genetic etiology for pregnancy loss. Genetics in Medicine 2020a;: 1–8. 65. Zhao Q, Chen J, Zhang X, Xu Z, Lin Z, Li H, Lin W, and Xie Q. Genome-wide association analysis reveals key genes responsible for egg production of lion head goose. Frontiers in Genetics 2020b; 10:1391. 66. Zhu Xl, Zeng Yf, Guan J, Li Yf, Deng Yj, Bian Xw, Ding Yq, and Liang L. Fmnl2 is a positive regulator of cell motility and metastasis in colorectal carcinoma. The Journal of pathology 2011; 224:377–388. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/path.2871&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21506128&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F01%2F08%2F2021.01.02.20248961.atom)