Abstract
The genetic aetiologies of more than half of rare diseases remain unknown1. Standardised genome sequencing (GS) and phenotyping of large patient cohorts provides an opportunity for discovering the unknown aetiologies2, but this depends on efficient and powerful analytical methods3. We have developed a portable computational and statistical framework for inferring genetic associations with rare diseases. At its core lies the ‘Rareservoir’, a compact database of rare variant genotypes and phenotypes. We built a Rareservoir of 77,539 genomes sequenced by the 100,000 Genomes Project (100KGP)4. We then applied the Bayesian association method, BeviMed3, across 269 rare diseases assigned to participants in the project, identifying 238 known5 and 21 novel associations. The novel results included three which we selected for validation. We provide compelling evidence that (1) loss-of-function variants in the ETS-family transcription factor encoding gene ERG lead to primary lymphoedema, (2) truncating variants in the last exon of TGFβ regulator PMEPA1 result in Loeys-Dietz syndrome6, and (3) loss-of-function variants in GPR156 give rise to recessive congenital hearing impairment. These novel findings confirm the power of our analytical approach for the aetiological discovery of rare diseases.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This study was funded by: Cambridge BHF Centre of Research Excellence [RE/18/1/34212] Wellcome Collaborative Award 219506/Z/19/Z MRC/NIHR Clinical Academic Research Partnership MR/V037617/1 PG/17/33/32990 PG/20/16/35047 Swiss Federal National Fund for Scientific Research n CRSII5_177191/1 KU Leuven BOF grant C14/19/096 NIDCD/NIH grant R01DC016295
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The 100,000 Genomes project was approved by East of England-Cambridge Central REC REF 20/EE/0035. The study at the University of Maryland was approved by the institutional review board (RAC#2100001). The study of the Japanese ancestry pedigrees bearing PMEPA1 truncating alleles was approved by the Institutional Review Board of the National Cerebral and Cardiovascular Centre (M14-020) and Sakakibara Heart Institute (16-035).
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
Genetic and phenotypic data for the 100KGP study participants are available through the Genomics England research environment via application at https://www.genomicsengland.co.uk/join-a-gecip-domain. PanelApp gene panels and evidence of associations were obtained using the PanelApp application programming interface (https://panelapp.genomicsengland.co.uk/api/docs/) on the 20th October 2021.