Abstract
Background Disturbances in the intricate processes that control craniofacial morphogenesis can result in birth defects, most common of which are orofacial clefts (OFCs). Nonsyndromic cleft lip (nsCL), one of the phenotypic forms amongst OFCs, has a non-random laterality presentation with the left side being affected twice as often compared to the right side. This study investigates the etiology of nsCL and the factors contributing to its laterality using a pair of monozygotic twins with mirror-image cleft lip.
Methods We conducted whole-genome sequencing (WGS) analyses in a female twin pair with mirror image nsCL, their affected mother and unaffected father to identify etiopathogenic variants. Additionally, to identify possible cleft lip laterality modifiers, DNA-methylome analysis was conducted to test for differential methylation patterns between the mirror twins. Lastly, DNA methylation patterns were also analyzed on an independent cohort of female cases with unilateral cleft lip (left=22; right=17) for replication purposes.
Results We identified a protein-altering variant in FGF20 (p.Ile79Val) within the fibroblast growth factor interacting family domain segregating with the nsCL in this family. Concurrently, DNA-methylome analysis identified differential methylation regions (DMRs) upstream of Zinc-finger transcription factor ZFP57 (Δβ > 5%). Replication of these results on an independent cohort, confirmed these DMRs, emphasizing their biological significance (p<0.05). Enrichment analysis indicated that these DMRs are involved in DNA methylation during early embryo development (FDR adjusted p-value = 1.3241E-13). Further bioinformatics analyses showed one of these DMRs acting as a binding site for transcription factor AP2A (TFAP2A), a key player in craniofacial development. Interactome analysis also suggested a potential role for ZFP57 in left/right axis specification, thus emphasizing its significance in cleft laterality.
Conclusion This study provides novel insights into the etiology of nsCL and its laterality, suggesting an interplay between etiopathogenic variants and DNA methylation in cleft laterality. Our findings elucidate the intricate mechanisms underlying OFCs development. Understanding these factors may offer new tools for prevention and management of OFCs, alleviating the burden on affected individuals, their families and global health.
Introduction
OFCs are birth defects that occur due to disturbances during embryonic development which results in the failure of the fusion of the upper lip (cleft lip), palate (cleft palate) or both (cleft lip and palate)(1). In addition to these structural birth defect, some cases may have additional anomalies, thus called syndromic OFCs, which account for about 30% of all OFCs(2, 3). Though variable based on ethnicity, the global incidence of OFCs averages 1 in 700 live births(1). The accompanying feeding problems, malocclusion, speech defects, esthetics problems, psychosocial impact on those affected and their families, as well as the financial burden due to the complex rehabilitation needed, contribute to the huge burden on the public health(1). Thus, efforts to understand the causes to successfully prevent new cases reducing the incidence of the clefts are warranted.
Several environmental and genetic risk factors have been implicated in OFCs developmental pathogenesis(1). Environmental risk factors impact the molecular signaling involved in craniofacial development, thus resulting in clefting via epigenetic mechanisms(4-6). One of the epigenetic factors that have been implicated in the etiology of cleft is abnormal DNA methylation(5, 7-17). DNA methylation can affect a phenotypic outcome via regulation of gene expression(5, 18, 19) by the addition of a methyl group to cytosine residues located within the CpG sites, typically found in regulatory regions of the genome, including promoters (18-20). Promoters serve as the binding sites for transcription factors and this binding activates the expression of a gene, a critical step in gene function. Methylation of CpG sites within promoter regions affect the transcription of their target genes. Thus, DNA methylation is regarded as a gene repression mechanism(21-24). This mechanism of gene repression has been reported in the developmental pathogenesis of clefting and its subtypes(7).
Monozygotic twins are invaluable to study the contribution of genetic and environmental factors to the etiology of birth defects as they share almost identical genetic compositions(25). Therefore, differences in their traits, which are regarded as discordant traits, are highly suggestive of an environmental etiology(25). Monozygotic twins arise from a single fertilized ovum that undergoes a twinning event post-fertilization. Epigenetic changes such as DNA methylation may preferentially occur in one of the twins thus contributing to the discordant phenotype(26). Differential methylation in monozygotic twin pairs has been investigated in the etiology of discordant phenotypes(26-31). A common type of monozygotic twins with nonconforming traits is the Mirror twins. Mirror twins are described as those monozygotic twins with discernable traits on contralateral sides of the body plane. About 1 in every 4 monozygotic twins show traits on contralateral sides such that one of the pair shows a trait on the right side while the other on the left side(32). Some of these traits include hair whirls, birth marks and more severely birth defects. Unilateral clefting with laterality discordance is one the birth defects that have been reported in mirror twins where one of the monozygotic pair has the cleft on the right side of the lip and the other on the left(33). The mirror cleft twins are significant in understanding the mechanisms determining cleft laterality. Previous studies that examined genetic factors that contribute to nonconforming laterality in mirror twins with OFC found no discordant genetic variants that may explain the side difference(34); instead, the role of epigenetic factors has been suggested(35). Thus, we explored the genomic and epigenomic landscape in a mirror cleft twin female pair to identify the genetic etiology and the factors contributing to the laterality difference of this common birth defect.
Materials and methods
Subjects, Sample collections and DNA extraction
In this study, we recruited a case-parent quartet consisting of twin pair and their parents. The twin pair were females with mirror-image nonsyndromic cleft lip only (nsCLO) phenotype and born to Filipino parents with only the mother diagnosed with the microform cleft lip. The father did not present with any structural abnormalities. Blood samples were collected from the probands and parents. The study was approved by the respective institutional review boards and the participants signed informed consents prior to collection of clinical data and biological samples.
Next-generation sequencing analyses
We conducted whole-genome sequencing (WGS) analyses of the quartet to identify pathogenic variants that contribute to the risk of the unilateral nsCLO subtypes (right-sided vs left-sided) in the mirror-twin family pedigree. Although, a previous report found no discordant genetic variants in mirror twins with different OFC subtypes (36), we investigated this within our cohort as well. Here, our research question was designed to identify the genetic variants that contribute to the right-sided nsCL and left-sided nsCL specifically.
Based on deep phenotyping of the pedigree (Figure 1A), we first analyzed the entire genome of this pedigree for discordant pathogenic protein-altering genetic variants. This analytical pipeline (Figure 1B) is based on the hypothesis that pathogenic variant(s) unique to each of these MZ-twin contribute to the specific and discordant nsCL subtypes (the laterality differences). For the MZ-twin with left-sided cleft lip (LCL), we screened for high confidence rare/novel protein-altering variants that are unique to this twin (absent in the other MZ-twin pair) and shared with the mother (with microform LCL). This analysis to identify the unique high confidence rare/novel protein-altering variants was repeated for the MZ-twin with RCL. These unique variants in the MZ-twin with RCL were then screened to eliminate maternally inherited genetic variants.
Pedigree showing the inheritance pattern of clefting in the family. RCL-Right cleft lip; LCL-Left cleft lip
Genomic analytical pipeline used to investigate discordant and concordant pathogenic protein-altering variants in the mirror twins contributing to the risk of cleft
As an alternate hypothesis, we investigated those maternally inherited shared protein-altering variants shared amongst all 3 cases but not present in the unaffected father. This is in accordance with previous studies that found no genetic basis for the laterality difference in mirror-twins with nsCL (34, 37). For each of the hypotheses, we used in silico tools to prioritize the variants based on their pathogenicity and on their location on genes with established roles in craniofacial development.
Genomic Analysis workflow
For both hypotheses, the entire genomic region of the parents and mirror twins were sequenced at an average coverage depth of 30x. The sequence data were aligned to the human genome assembly GRCh38 (Hg38) and alternate alleles at each genomic loci were called using the Dynamic Read Analysis for GENomics (DRAGEN). These alternate alleles include single nucleotide variants (SNVs), insertions and deletions (InDels).
We then analyzed the genomes for novel/rare protein-altering genetic variants that contribute to the risk of clefting in the twin pair. Prior to analyzing the genome for protein-altering variants, we selected the high confidence variants which are those variants with a genotype quality (GQ) of at least 20 and a read depth (RD) of at least 10. Using the genomic population database, gnomAD (https://gnomad.broadinstitute.org/), we selected those variants with minor allele frequency (MAF) less than 1% (0.01).
Next, we screened for those variants that alter the protein (missense and LOF variants), those that segregate with the phenotype within the pedigree and subsequently used in silico tools including Sorting Intolerant From Tolerant, SIFT (http://sift.jcvi.org/), Polymorphism Phenotyping, PolyPhen2 (http://genetics.bwh.harvard.edu/pph2/) and Combined Annotation Dependent Depletion, CADD (https://cadd.gs.washington.edu/) to predict the pathogenicity of these variants. Combinatorially, we used the American College of Medical Genetics and Genomics (ACMG) classification system to categorize these variants based on a benign or pathogenic scale. In addition, we used the web-based machine-learning algorithm, DOMINO (https://domino.iob.ch/) to assess the probability that the variants identified in these genes are dominant. Thereafter, we prioritized those variants in genes that contribute to craniofacial development. The prioritization of these genetic variants is based on the knowledge that pathogenic variants in craniofacial genes contribute to clefting.
DNA-methylome analyses
To identify factors that modify the effect of the etiopathogenic variants, thus resulting in the differential phenotypic expression in the MZ-twin (laterality difference of nsCLO), we investigated the DNA methylation patterns in each of the twins. The genome-wide DNA methylation profiles were generated using the EPIC BeadChip assay (EPIC array, Illumina, San Diego, CA, United States) which contains over 850,000 probes. Thus, this EPIC array was used to evaluate the methylation profiles of over 850K CpG sites in the mirror twins. Importantly, these sites cover annotated regions of the human genome designated as CpG islands, RefSeq genes, ENCODE open chromatin, ENCODE transcription factor-binding sites and FANTOM5 enhancers. Details of the data generation, controls and cleaning have been previously published (38).
Following data cleaning and preprocessing as previously described (38), the methylation profiles were estimated as Beta (β) values (ranging from 0 to 1), which is the ratio of methylated signals to the total sum of unmethylated and methylated signal within a CpG site. Percent composition for 12 major cell types in the blood was determined using the EpiDISH package (version 2.16.0) (39), which was used to normalize the beta-values to remove variance due to differences in cell type composition among the samples. The normalized beta-values were used for the downstream analysis. Additionally, we evaluated the methylation profiles of gender-matched controls using the same array to identify and remove highly variable CpG sites with coefficient of variation greater than 20% which are unlikely to be related to the OFC phenotypes. These nuisance CpG sites were not included in our downstream analysis.
Thereafter, we estimated the absolute methylation differences between the mirror twins (Δβ = |Left CL β – Right CL β|) for each CpG sites remaining. CpG sites with Δβ ≥5% encompassed the identified differentially methylated positions (DMPs) for subsequent analyses below.
Subsequently, we performed gene ontology and enrichment analysis to identify the cellular processes as well as those enriched among the CpG sites within the identified DMPs.
To demonstrate the reproducibility of these DMPs, we conducted a replication study using the CpG sites that showed a Δβ ≥ 5%. Here, we selected an independent female cohort with methylation profiles available for such analysis. This independent cohort consisted of female individuals (n=39) with unilateral clefting phenotype (nsCL/P). Of these, 22 individuals had left-sided nsCL and 17 individuals had right-sided nsCL.
Results
WGS analyses identify shared protein-altering variants in Craniofacial genes
The analysis of discordant protein-altering variants amongst the MZ-twins that would explain the etiology of the difference in laterality affection identified a high confidence heterozygous variant in KRT6B present in the MZ-twin with LCL only. This variant was a paternally inherited missense variant predicted to be among the top 1% most deleterious mutations in the human genome (CADD score = 22.7) and consistently deleterious by other in silico tools except for ClinVar. Although KRT6B has not been associated with orofacial clefts, the knockout mouse displayed an abnormal hard palate morphology. Despite the identification of the damaging protein-altering variant in this gene, we found that the phenotypic effect of this gene occurs in homozygous state as predicted by the machine learning tool, DOMINO. Thus, based on the occurrence of this damaging KRT6B variant in a heterozygous state, the presence of this variant in an unaffected father and a ClinVar prediction of a benign effect, we concluded that this variant is unlikely to be causal for the nonconforming laterality of the cleft lip phenotype. We also found a pathogenic splice site de novo variant in ACLY gene as supported by ClinVar classification. Although, no evidence suggests a role for this gene in craniofacial development, the knockout mouse showed an early embryonic stage lethality as the mice did not survive beyond E7.
Results of this analysis in the MZ-twin with LCL did not find any variants of clinical significance in genes with established roles in craniofacial development. In fact, we identified a protein-altering variant of uncertain significance in MUC3A which was maternally-inherited (heterozygous). Based on the in silico prediction of the MUC3A variant and the prediction that the gene itself is likely to contribute to a recessive phenotype, we concluded that this heterozygous maternally-transmitted variant was highly unlikely to be the risk variant. Thus, similarly to previous reports we did not find any discordant pathogenic variants that could explain the laterality affection difference amongst the members of this twin pair.
Investigation of maternally transmitted shared variants for those potentially pathogenic variants (detailed result shown in Table 1) that would explain the developmental pathogenesis of the cleft lip phenotype identified 16 variants in potential gene candidates (Table 2).
These variants are rare, or novel based on the MAF in population control databases, alter the protein sequences and ranked among the top 1% most deleterious mutations in the human genome (based on CADD scores ≥ 15). Additionally, other in silico tools (SIFT and Polyphen2) were used to predict the deleteriousness and damaging effects of the amino acid changes on the protein (Table 1B).
These 16 protein-altering variants are in genes that contribute to craniofacial development based on evidence from the facial gene scan, mouse genome informatics and cleft genes databases. The facial genes scan consisted of list of genes that previously detected through association studies of 3D soft tissue facial morphogenesis(40). Some of these variants are in genes whose genetic manipulation in murines resulted in craniofacial disorders. Indeed, FANCI, SALL4, EXT1, CPLANE1, and PNN resulted in cleft phenotypes (Table 1B). Additionally, EXT1, CPLANE1, PNN and CYP1A1 are among those genes within the cleft gene database (CleftGeneDB; https://bioinfo.uth.edu/CleftGeneDB)(41) supporting their roles in clefting.
The machine learning tool DOMINO makes robust and reliable inference of the inheritance patterns of different genes(42). Through these inferences of the genes’ inheritance patterns, we assessed the likelihood of the genetic variants to result in dominant traits(42). Among those potentially pathogenic protein-altering variants in craniofacial genes, only FGF20 was predicted to follow a dominant inheritance pattern, all others that had machine learning predictions follow a recessive mode of inheritance. Sanger validation confirmed the presence of this FGF20 pathogenic mutation in the twins and the affected mother, but not the unaffected father.
Protein-function analysis showed that the isoleucine amino acid residue resides within the fibroblast growth factor family domain (IPR002209). This domain is critical for the binding of FGF20 with other molecules and in contact with another part of the protein which contribute to FGF20 receptor binding activity(43, 44). The wild-type residue is highly conserved at this position. Notably, the mutant residue, valine is not found in homologs, suggesting that the mutation is damaging. These analyses suggest that the damaging variants affects the interaction of FGF20 with other molecules thereby disturbing the signal transduction function of the FGF20 (44).
DNA-methylome Analyses Identify DMR in Zinc-finger transcription factor expressed in early embryonic development
In the analyses of the contribution of DNA methylation to the discordant laterality affectation of cleft lip in this MZ-twin pair, we conducted a genome-wide methylation analysis using the 850k EPIC array (Figure 2). We investigated the genome-wide methylation profile of the CpG sites from the MZ-twin with nonsyndromic cleft lip as well as gender-matched unrelated controls. Methylation profiles were estimated as beta values and those CpG sites with coefficient of variation (CoV) in the controls > 20% were filtered out of this analysis as they are less likely to contribute to clefting.
DNA Methylome analytical pipeline.
For remaining CpG sites, we evaluated the absolute difference of the beta values (Delta beta). This delta beta value is estimated as the difference between the absolute beta value between the mirror-twin with LCL and that with RCL (Δβ = |LCL β – RCL β|). We thereafter selected those sites with absolute delta beta (|Δβ|) > 5%. We identified 408 CpG site with absolute delta beta (|Δβ|) > 5%. Of these, 100 presented higher methylation in the MZ-twin with LCL while 308 CpG sites had higher methylation values in the RCL affected twin.
To identify those CpG sites with biological significance, we selected those with consistency in direction that are within 5kb of the transcription start site of a gene. Enrichment analysis showed a significant enrichment of the biological process with the GO term “DNA methylation involved in embryo development” (adjusted FDR p-value = 1.3241E-13) (Figure 3).
Enrichment analysis of DMRs.
Further evaluation of these DMRs led to the identification of a cluster of CpG sites upstream of the transcription start site (TSS) of the ZFP57 gene (Figure 4). Notably, these CpG sites upstream ZFP57, herein after termed as differentially methylated regions (DMRs), contributed significantly to the most enriched biological process. These DMRs showed a higher methylation value in the MZ-twin with LCL than that with RCL with Δβ ranging from 7% to 12%.
CpG sites upstream ZFP57 predicted to be binding site for TFAP2A, an interaction that contributes to craniofacial development. Protein interactome analysis also confirms the interaction with TFAP2A. Interactome analysis also showed significant contribution of this network to methylation program involved in embryonic development and laterality (left/right axis) determination.
CpG site upstream ZFP57 replicated in independent cohort and predicted to act as TFAP2A binding site
Among the CpG sites upstream ZFP57, the cg06032337 site showed a significant difference (p-value = 0.04) in methylation between the left-sided nsCL and right-sided nsCL independent female cohort (Figure 4).Following this finding, we investigated the likelihood of this promoter region acting as a transcription faction binding site to genes involved in craniofacial development. We used the Jaspar bioinformatic analysis tool to investigate possible genes that interact with this promoter region. This CpG site is predicted to act as binding site for transcription factor AP2A, an interaction that is involved in craniofacial development (Figure 4).
Furthermore, the interaction between the ZFP57 and TFAP2A was investigated at the protein level with in silico tools. We used String-db (https://string-db.org/) to investigate possible protein-protein interaction networks involving ZFP57 and TFAP2A gene-products as well as the functional enrichment of the resulting networks. We found significant interactions between ZFP57 and TFAP2A as well as other proteins (Figure 4). Functional enrichment analysis showed that the interaction network is significantly associated with DNA methylation involved in embryonic development and left/right axis specification processes. This finding suggests that the interaction between ZFP57 and TFAP2A through the CpG site (cg06032337) plays a significant role in the laterality of nonsyndromic cleft lip.
Discussion
This study was conducted in twins which are highly valuable in the genomics as well as epigenomics studies of phenotypes. Monozygotic twins share similar genetic makeup thus, phenotypic differences may be explained by epigenetic dissimilarities amongst them. Our current analysis focuses on the genomic etiology of nonsyndromic cleft lip (nsCL) and the epigenomic factors that contribute to the side on which the nsCL appeared. We therefore conducted whole-genome sequencing analyses (WGS) and DNA-methylome analyses to explore the pathogenic variants and epigenetic patterns of this unique cohort.
Whole-genome sequencing analysis have been used to identify pathogenic variants playing roles in the etiology of OFCs(45, 46). These next-generation sequencing analyses have successfully identified novel risk loci and variants associated with this complex trait(45-48). We identified protein-altering potentially pathogenic mutation in fibroblast growth factor 20 (FGF20) gene which is on chromosome 8p22, one of the loci associated with OFC(49, 50). To the best of our knowledge, the discovery of dominant FGF20 mutation in these mirror-image twins’ pedigree with nsCL provides for the first time, evidence for the role of rare coding FGF20 variants in the etiopathogenesis of clefting in humans.
FGF20 is one of the members of the fibroblast growth factor family and belongs to the factor 9 (FGF9) subfamily which binds to the fibroblast growth factor receptors 2 and 4 (FGFR2 and FGFR4) activating cellular processes that drive embryonic development(51). Expression studies have reported co-expression of FGF20 and FGFR2 in the developing mice palatal epithelium and the contribution of the FGFR2 to palatogenesis has been reported(52, 53) suggesting a possible role of the gene in craniofacial development.
Due to the similar genetic makeup in monozygotic twins, we hypothesize that given that the FGF20 pathogenic variant is present in both twins and is inherited from the affected mom, epigenetic changes are more likely to contribute to the laterality difference in the phenotypic expression. Thus, we investigated DNA methylation patterns in the mirror twins to elucidate their role in the laterality of nsCL. The DNA-methylome results identified differentially methylated regions (DMRs), with a notable enrichment in the GO term “DNA methylation involved in embryo development.” Specifically, a cluster of CpG sites within 5kb upstream of the transcription start site (TSS) of the ZFP57 gene exhibited significant methylation differences between the twins. These DMRs are associated with the DNA methylation processes controlling gene expression as well as embryonic development. Although this study is the first to report the role of ZFP57 in the developmental pathogenesis of nsCL, a previous study had reported an association of the gene with neurodevelopmental disorder(54). There is a strong correlation between these disorders suggesting pleiotropic roles of genes involved in both processes. Additionally, these suggest the early expression of ZFP57 during embryonic development.
Further exploration of these DMRs showed that one of them: cg06032337, predictably acts as binding site for the transcription factor AP2A: a member of the transcription factor family that controls the neural crest gene regulatory network(55). Mutations in this gene have been implicated in the etiopathogenesis of clefting(56-58). Protein interactome analysis showed that the ZFP57 protein is in a network with other proteins such that one of the sub-networks is associated with left-right axis specification during embryogenesis. One limitation in our analysis is the sample size of monozygotic twins with nsOFCs.
Albeit we detected a nominal significant association between one of the CpGs and cleft laterality, the p-value would not have withstood multiple testing. Thus, the need to replicate this study in a larger cohort to increase our power to detect significant associations after correction for multiple testing. In conclusion, this integrative OMICs analysis identified a pathogenic mutation in FGF20 whose phenotypic expression of unilateral nsCL (left vs right) is modified by differential methylation of CpG sites upstream ZFP57 whose gene-product is predicted to interact with other proteins in a network important for embryonic development and left-right axis specification.
Data Availability
All data produced in the present study are available upon reasonable request to the authors.
Footnotes
Funding: NIH/NIDCR K01DE027995