Summary
Identifying genes linked to extreme phenotypes in humans has the potential to highlight new biological processes fundamental for human development. Here we report the identification of homozygous loss of function variants in the primate-specific gene ZNF808 as a cause of pancreatic agenesis. ZNF808 is a member of the KRAB zinc finger protein (KZFPs) family, a large and rapidly evolving group of epigenetic silencers that target transposable elements. We show that loss of ZNF808 in vitro results in aberrant activation of many transposable elements it normally represses during early pancreas development. This results in inappropriate specification of cell fate with induction of genes associated with liver endoderm and a loss of pancreatic identity. We show that ZNF808 and its transposable element targets play a critical role in cell fate specification during human pancreatic development. This is the first report of loss of a primate-specific gene causing a congenital developmental disease and highlights the essential role of ZNF808 for pancreatic development in humans.
Introduction
Human genetic studies have provided key insights into differences in pancreatic development between human and mouse, highlighting multiple evolutionary conserved transcription factors with different dosage-dependent effects between the two species1. However, there are unexplained yet fundamental differences in the genetics, development, cell structure, and physiology of insulin-producing beta cells between human and rodent pancreas2,3. Identifying the genetic causes of rare congenital diseases, such as pancreatic agenesis, can highlight genes which are essential for organ development in humans. Here, we report the identification of the primate-specific gene ZNF808 as a novel genetic cause of pancreatic agenesis, a condition resulting from defective fetal pancreatic development.
ZNF808 is a member of the KRAB zinc finger protein (KZFPs) family, the largest group of DNA binding factors in the human genome with around 350 genes. KZFPs primarily act as epigenetic silencers of transposable elements (TE) through establishment of heterochromatin-associated H3K9me3. The KZFP family is rapidly evolving, with new members found at most phylogenetic branches since its emergence at the dawn of tetrapods. KZFPs co-evolve with TEs in an arms race aimed at suppressing their replication4. A large survey of KZFPs and their binding sites has revealed that many are evolutionary conserved at various levels (for example within primates or within mammals) along with remnants of their TE targets. These conserved KZFPs are hypothesized to control chromatin accessibility of regulatory platforms derived from conserved TEs, playing a role in the evolutionary rewiring of gene regulatory networks5.
Results
Identification of ZNF808 homozygous variants as a cause of pancreatic agenesis
To identify novel genetic causes of pancreatic agenesis, we performed exome sequencing in two unrelated affected individuals and their parents. Both patients were born from a consanguineous union and were diagnosed with pancreatic agenesis, defined as insulin-dependent diabetes diagnosed in the first 6 months of life (neonatal diabetes) and exocrine pancreatic insufficiency6. The two patients were found to be homozygous for 11 and 12 rare coding non-synonymous variants respectively (Supplementary tables 1 and 2). The only gene found to harbour homozygous variants in both patients was ZNF808, with a deletion of exons 4 and 5 identified in proband 1 and a nonsense variant (p.Leu213*) detected in proband 2.
To replicate this finding, we investigated the presence of rare ZNF808 biallelic variants in 232 additional patients diagnosed with neonatal diabetes before the age of 6 months without an identified pathogenic variant in the 30 known aetiological genes7. Homozygous loss of function ZNF808 variants (6 nonsense, 3 frameshifts, and 1 whole-gene deletion) were identified in an additional 10 unrelated individuals (Figure 1A, Supplementary table 3). Family testing confirmed that all unaffected parents were heterozygous carriers for the identified ZNF808 loss of function variants. In one family a sibling diagnosed with diabetes in age-range 39-52 weeks, had the same homozygous ZNF808 mutation as the proband (Sup. table 3). All frameshift and nonsense variants affect residues within the last exon of the gene containing all 23 zinc finger domains (which comprises 93% of the protein) and are predicted to result in a truncated protein lacking from 3 to 23 of the 23 functional zinc finger domains. Two patients (probands 2 and 3) were homozygous for large deletions predicted to result in no ZNF808 mRNA.
Despite ZNF808 being widely expressed in human tissues, the patients’ phenotype is restricted to the pancreas, with diabetes onset in the neonatal period (median age at diagnosis 12 weeks, IQR 3-21) and low birth weight (median −2.98 SD, IQR −3.56 to −2.47) which results from markedly reduced insulin-mediated growth in utero (Supplementary table 3). Pancreatic agenesis was diagnosed in 5/5 patients for whom exocrine pancreatic function through fecal elastase measurement was available and was suspected in two further cases who presented clinical symptoms of malabsorption, suggesting a likely developmental defect affecting both the exocrine and endocrine pancreatic functions. No additional extra-pancreatic features were consistently observed in the cohort. The extremely low birth weight, early age at diagnosis of diabetes, and exocrine pancreatic insufficiency all suggest a role of ZNF808 in early pancreatic development. This is supported by previous human embryo transcriptomic studies showing that ZNF808 is expressed in the embryonic pancreas8.
ZNF808 and its transposable element targets are primate-specific
The evolutionary origin of ZNF808 can be traced to a common ancestor of Old World monkeys and Apes, with a very high level of conservation in Apes (Figure 1B). Between Old World monkeys and humans, the absence of the first two zinc fingers, two additional zinc fingers in the middle of the array, and a number of amino acid differences in DNA binding residues of most zinc fingers can be observed. Importantly, no KZFP with a similar array of zinc fingers could be found in New World monkeys nor any other mammals. Genome-wide patterns of ZNF808 show that it primarily targets the long terminal repeat (LTR) of endogenous retroviruses classified as MER11 (A/B/C) elements (Figures 2A). Analysis of mammalian genomes shows that these transposons also originate in Old World monkeys and Apes (Figure 2B). Last, we found that MER11 elements are also targeted by other primate-specific KZFPs (ZNF433, ZNF440, ZNF468 and ZNF525) (Figure 2C). Therefore, ZNF808 and its targeted MER11 elements are exclusively found in primates, making it the first primate-specific genetic cause identified for a congenital developmental disease (Supplementary Figure 1).
MER11 elements are enriched in transcription factor binding sites involved in early embryonic development
Correlative studies have previously suggested that MER11 elements might be a source of regulatory potential in multiple tissues9–11. Our analysis of publicly available ChIP-seq data shows that various subfamilies of MER11 elements bound by ZNF808 show enriched occupancy for a set of 26 DNA binding factors (Figure 2D), including some transcription factors involved in early embryonic development. Importantly, three of these (GATA4, GATA6, and HNF4A) have been previously described as monogenic causes of pancreatic agenesis and diabetes6,13–15. This evidence suggests that the transposable elements silenced by ZNF808 might be domesticated regulatory platforms and could play a functional role in primate pancreas development.
Loss of ZNF808 reveals active epigenetic potential of MER11 elements during pancreatic differentiation
To characterize the molecular events triggered by ZNF808 loss in an in vitro model of pancreas development, we functionally inactivated ZNF808 in H1 human embryonic stem cells using CRISPR, obtaining a clone with in-frame deletions resulting in the loss of 8 zinc fingers (ZNF808del/del) (Supplementary figure 2A). Using this clone, we assayed epigenetic changes resulting from the functional impairment of ZNF808, both in stem cells and at key steps of differentiation towards beta cells (Figure 3A and Supplementary figure 2C)16. Using ChIP-seq, we quantified the genome-wide presence of H3K9me3, a hallmark of heterochromatin that is induced by KZFPs. H3K9me3 was detected in around half the known binding sites of ZNF808, with the vast majority being transposons (mostly MER11 elements, data not shown). We observed loss of H3K9me3 at a subset of these TEs in the ZNF808del/del mutant at early stages of differentiation, ranging from 64% of sites in ES cells to 31% in primitive gut tube (Figure 3B). However, the loss was not complete even when considering subfamilies of elements. MER11A were most affected (497 sites), followed by MER11B (254 sites) and MER11C (123 sites), approximately following the prevalence of ZNF808 binding in these families (Figure 3B). The presence of other primate-specific KZFPs redundantly binding MER11 elements as mentioned above could explain the incomplete loss of H3K9me3.
We next surveyed if the loss of silencing revealed regulatory potential at ZNF808 binding sites. In embryonic stem cells (ES), a number of sites were already positive for H3K27ac, but no significant change could be detected in the ZNF808del/del clone in terms of total positive H3K27ac sites. However, during differentiation, we observed the emergence of H3K27ac epigenetic marks in 149 ZNF808 binding sites at the definitive endoderm stage and 100 sites at the primitive gut tube stage (Figure 3B). We could identify a subset of 67 MER11 elements where loss of H3K9me3 was followed by gain of H3K27ac during differentiation in the ZNF808del/del clone (Figure 3C). Altogether, these results show that ZNF808 is normally silencing MER11 elements during differentiation toward pancreatic cells and that the loss of ZNF808 triggers epigenetic activity associated with transcription from a subset of them, suggesting that these could function as alternative promoters or enhancers.
Upon ZNF808 loss, activated MER11 elements lead to upregulation of nearby genes at early and late stages of pancreatic differentiation
We next quantified transcriptomic changes induced in the ZNF808del/del during our differentiation protocol. We found that the number of genes significantly dysregulated (FDR < 0.05, absolute fold change > 1.25) varies from 473 at the ES stage to 565 at definitive endoderm (DE), followed by a marked peak of 3858 and 3670 differentially regulated genes at the primitive gut tube (PG) and posterior foregut (PF) stages, respectively, reducing to 683 at the pancreatic progenitor (PP) stage (Figure 4A). We investigated to what extent the dysregulation of MER11 elements might be responsible for these gene expression changes by assessing the enrichment of genes changing expression in proximity to ZNF808 binding sites. Genes activated at the ES and DE stages were enriched in proximity to MER11 elements at enhancer but not promoter distances, and we find no particular proximity enrichment for genes repressed (Figure 4B). We find clear examples of robustly upregulated genes in close proximity to the set of MER11 elements that lose H3K9me3 and gain H3K27ac at early stages (Figure 4C). The proximity enrichments diminish at the PG and PF stages but do partially recover at the PP stage suggestive of roles for MER11 elements throughout pancreatic cell differentiation. The loss of proximity enrichment at the PG stage concomitant with large increase in dysregulated genes, suggests that these genes are involved in expression programmes downstream of MER11 activation.
ZNF808 inactivation leads to dysregulation of genes defining pancreatic and liver identity during pancreatic differentiation
To understand how this gene expression dysregulation impacts pancreatic development we performed gene set enrichment analysis using Enrichr17. We first associated genes activated or repressed in ZNF808del/del at each stage of the in vitro pancreatic differentiation protocol to genes with specific tissue expression in the Human Gene Atlas. We observed a large enrichment of genes known to be expressed in liver at the PG stage, that precedes a downregulation of genes specifically expressed in the pancreas PF and PP stages (Figure 4D - left). Assessing the trajectory of pancreatic and liver programmes, we see the downregulation of pancreatic genes and the upregulation of liver genes beginning from the PG stage and continuing until the PP stage (Figure 4D – right). Indeed, transcription factors necessary for pancreatic development, such as PTF1A and PDX1, show reduced expression at PG and PF stages, which is followed by reduced expression of the endocrine markers insulin and glucagon at the PP stage (Figure 4F). In parallel with these events, we see activation of early markers of liver differentiation (including AFP, ALB and HNF4A, the latter of which has roles in specifying pancreas and liver) at the PG and PF stages (Figure 4E), as confirmed by immunostaining for AFP (Figure 4F). In further support of this hypothesis, we examined the relationship between our data and a survey of the transcriptome of human embryo dorsal pancreas buds and hepatic cords obtained by laser dissection8. At the PG stage, we found a comprehensive enrichment with 266 of the 2068 genes activated in ZNF808del/del having increased expression in hepatic cords and 181 of 1790 genes repressed in in ZNF808del/del having decreased expression in dorsal pancreas buds (Fishers Exact p < 10−56, Odds Ratio 4.02 for genes activated; p < 10−25, OR 2.82 for repressed). Taken together this data suggests that ZNF808 acts to balance the differentiation of endoderm progenitors to liver or pancreas lineages.
Discussion
In this manuscript we report the identification of ZNF808 as critical for human pancreatic development, the first example of a primate-specific gene implicated in the aetiology of a congenital developmental disease in humans. This finding offers novel insights into the marked differences between human and rodent pancreas and further confirms the crucial importance of gene discovery efforts in individuals with extreme phenotypes, as the role of ZNF808 could not have been identified through mouse studies.
This discovery also provides one clear example where a KZFP and its TE target have been co-domesticated to rewire gene regulatory networks. Although the mammalian-conserved ZFP568 was found to play a critical role in regulating IGF2 in early embryogenesis, no transposable element was found at the causal binding sites18. While the biological function of most KZFPs remains to be elucidated, the implication of primate-specific ZNF808 and MER11 transposable elements in such an important evolutionary process as pancreas development suggests that KZFPs have the potential to be collectively involved in other important aspects of human biology. Our work confirms that the regulatory potential contained in domesticated transposable elements is only starting to be revealed. There are 150 KZFPs shared between human and most mammals, the vast majority of which with no known biological function, highlighting the need to further investigate KZFPs targeting ancient remnants of transposons.
Our results reveal that loss of ZNF808 results in the aberrant induction of a gene expression programme normally found in liver endoderm during in vitro differentiation toward pancreatic lineages. This suggests that a cell fate diversion from pancreas to liver lineages may be the pathogenic mechanism that results in neonatal diabetes and exocrine insufficiency in patients with biallelic loss of function mutations in ZNF808, with ZNF808 acting to balance these two lineages. Our results imply that both KZFPs and their MER11 elements targets have been co-domesticated to regulate multiple stages of endoderm derived organogenesis, with genes dysregulated found in proximity to MER11 elements at early and late stages of differentiation. This also suggests that MER11 element are active later or elsewhere in development and that ZNF808 is necessary to ensure they are repressed at early stages of pancreas development. To fully untangle the network of regulation established by ZNF808 will require accounting for the many KZFPs that bind subsets of MER11 elements., These together could form a complex combinatorial system of regulation, with select subsets of MER11 elements allowed to be active in particular cellular contexts depending on the combination of KZFPs and transcription factors present. Indeed, all five KZFPs identified to bind MER11 elements have dynamic patterns of expression, including during pancreatic development (data not shown).
Finally, this work has highlighted the role of a primate-specific KZFP, ZNF808, and its TE targets in the regulation of human beta cell development, providing important insights into human development and diabetes pathogenesis, which will support efforts aimed at providing cell-based therapies for people with diabetes.
Materials and methods in Supplementary Appendix.
Data Availability
All data will be made available on publication in a peer reviewed publication. All disease-causing variants identified by next generation sequencing will be uploaded onto the DECIPHER database (https://decipher.sanger.ac.uk/). All variants identified through genome sequencing will be submitted as a .vcf file with allele frequencies to the EMBL-EBI European Genome-Phenome Archive (EGA) (https://www.ebi.ac.uk/ega/home).
All epigenomic and transcriptomic data from in vitro pancreatic differentiation will be deposited in NCBI Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov/geo/).
Funding
EDF is a Diabetes UK RD Lawrence Fellow (19/005971) and has been the recipient of an EASD Rising Star fellowship during this study. ATH and SE were the recipients of a Wellcome Trust Senior Investigator award (grant WT098395/Z/12/Z) during this study, and ATH is employed as a core member of staff within the National Institute for Health Research–funded Exeter Clinical Research Facility and is an NIHR Emeritus Senior Investigator. SEF and MI have Sir Henry Dale Fellowships jointly funded by the Wellcome Trust and the Royal Society (grant 105636/Z/14/Z and 206688/Z/17/Z, respectively). NO has a lectureship funded by a Research England’s Expanding Excellence in England (E3) award. Most of the experimental studies were funded by the Academy of Finland Center of Excellence MetaStem (grant 312437). The Novo Nordisk Foundation and the Sigrid Juselius Foundation. HM is a member of the Doctoral Program in Integrative Life Science at University of Helsinki. AT Doctoral studies were funded by the Foundation for Education and European Culture (IPEP) in Greece and by the Cambridge Trust. DB received funding from a European Molecular Biology Organization long-term fellowship (ALTF 295-2019).
Conflict of interest
Nothing to declare
Acknowledgements
We are grateful to the patients and their families for taking part in our gene discovery study. We also thank Solja Eurola and Jarkko Ustinov for expert technical assistance.