Abstract
Congenital Adrenal Hyperplasia (CAH), one of the most common inherited disorders, is caused by defects in adrenal steroidogenesis. It is potentially lethal if untreated and is associated with multiple comorbidities, including fertility issues, obesity, insulin resistance, and dyslipidemia. CAH can result from variants in multiple genes, but the most frequent cause is deletions and conversions in the segmentally duplicated RCCX module, which contains the CYP21A2 gene and a pseudogene.
The molecular genetic test to identify pathogenic alleles is cumbersome, incomplete, and available from a limited number of laboratories. It requires testing parents for accurate interpretation, leading to healthcare inequity. Less severe forms are frequently misdiagnosed, and phenotype/genotype correlations incompletely understood. We explored whether emerging technologies could be leveraged to identify all pathogenic alleles of CAH, including phasing in proband-only cases. We targeted long-read sequencing outputs that would be practical in a clinical laboratory setting.
Both HiFi-based and nanopore-based whole-genome long-read sequencing datasets could be mined to accurately identify pathogenic single-nucleotide variants, full gene deletions, fusions creating non-functional hybrids between the gene and pseudogene (“30-kb deletion”), as well as count the number of RCCX modules and phase the resulting multimodular haplotypes. On the Hi-Fi data set of 6 samples, the PacBio Paraphase tool was able to distinguish nine different mono-, bi-, and tri-modular haplotypes, as well as the 30-kb and whole gene deletions. To do the same on the ONT-Nanopore dataset, we designed a tool, Parakit, which creates an enriched local pangenome to represent known haplotype assemblies and map ClinVar pathogenic variants and fusions onto them. With few labels in the region, optical genome mapping was not able to reliably resolve module counts or fusions, although designing a tool to mine the dataset specifically for this region may allow doing so in the future.
Both sequencing techniques yielded congruent results, matching clinically identified variants, and offered additional information above the clinical test, including phasing, count of RCCX modules, and status of the other module genes, all of which may be of clinical relevance. Thus long-read sequencing could be used to identify variants causing multiple forms of CAH in a single test.
Competing Interest Statement
JM, ECD, BP, CEK, SN, MA, EV, ID, SIB declare no conflict. AR, WJR, XC are employees and shareholders of Pacific Biosciences. NJN is a consultant for Neurocrine Biosciences, Inc and on an expert panel for World Athletics. CF is a consultant for Neurocrine Biosciences and Eton Pharmaceuticals. H.B. owns stock shares of Illumina, Inc., Bionano Genomics, Inc., and Pacific Biosciences of California, Inc.
Funding Statement
We acknowledge the support of the Chan Zuckerberg Initiative (CZI), who funded sequencing and analysis costs for the Nanopore project to BP, KM and EV. ECD, EV and the DSD-TRN biobank are supported in part by grant RO1 HD093450. ECD, EV, SIB, ID, and MA are supported in part by the UCI-GREGoR U01 HG011745 grant.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
Informed consent was obtained by the referring clinical teams for families to provide deidentified clinical data and blood samples to the DSD-TRN. Ethical approval was granted by the following: Internal Review Board of the Ann & Robert H. Lurie Children's Hospital of Chicago (Protocol #2015-536), the Colorado Multiple Institutional Review Board at the University of Colorado (Protocol #19-3084), Northwell Health Internal Review Board (Protocol #15-001), and Institutional Review Boards of the University of Michigan Medical School (IRBMED, Protocol HUM00050916) Use of the deidentified DSD-TRN biobank samples for genetic research was approved by the Institutional Review Board at Children's National Hospital, Washington, D.C., USA under protocol P000010217.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data Availability
The RCCX pangenome and pipeline to perform the described analysis are included in the Parakit repository in GitHub. HiFi Paraphase output and other data are available upon reasonable request to the authors.