Abstract
Whole genome sequencing was first offered clinically in the UK through the 100,000 Genomes Project (100KGP); however, data analysis was time and resource intensive with 3 million variants found per patient. Consequently, analysis was restricted to predefined gene panels associated with the patient’s phenotype. However, panels rely on clearly characterised phenotypes and risk missing diagnostic variants outside of the panel(s) applied. We propose a complementary method to rapidly identify diagnostic variants, including those missed by 100KGP methods.
The Loss-of-function Observed/Expected Upper-bound Fraction (LOEUF) score quantifies gene constraint, with low scores correlated with haploinsufficiency. We applied DeNovoLOEUF, a filtering strategy to sequencing data from 13,949 rare disease trios in the 100KGP, by filtering for rare, de novo, single nucleotide loss-of-function variants in OMIM disease genes with a LOEUF score <0.2. We conducted our analysis prospectively in 2019 and compared our findings with the corresponding diagnostic reports as returned in 2019 and again in 2021.
324/336 (96%) of the variants identified through DeNovoLOEUF were classified as diagnostic or partially diagnostic. We identified 39 diagnoses that were “missed” by 100KGP standard analyses, which are now being returned to patients. We have demonstrated a highly specific and rapid method with a 96% positive predictive value that has good concordance with standard analysis, low false positive rate, and can identify additional diagnoses. Globally, as more patients are being offered genome sequencing, we anticipate that DeNovoLOEUF will rapidly identify new diagnoses and facilitate iterative analyses when new disease genes are discovered.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
EGS was supported by the Kerkut Charitable Trust and University of Southampton's Presidential Scholarship Award; HR by the NHGRI U24 HG011450 and U41 HG006834; and AOD-L by the National Institute of Mental Health U01 MH119689 and Manton Center for Orphan Disease Research Scholar Award. EGS, HLR and AOD-L were supported by the National Human Genome Research Institute (NHGRI), the National Eye Institute, and the National Heart, Lung and Blood Institute grant UM1 HG008900. DB was generously supported by a National Institute of Health Research (NIHR) Research Professorship RP-2016-07-011.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
Health Research Authority of NRES Committee East of England gave ethical approval for this work ( REC: 14/EE/1112; IRAS: 166046)
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Footnotes
Ethics approval and consent to participate All patients included in this study consented to participate in the 100,000 Genomes Project - ethics approval by the Health Research Authority (NRES Committee East of England) REC: 14/EE/1112; IRAS: 166046. The ethical approval letter is available upon request.
Supplementary file updated
Data Availability
Access to the 100KGP dataset analysed in this study is only available as a registered GeCIP member in the Genomics England Research Environment, but restrictions apply to the availability of these data due to data protection and are not publicly available. Information regarding how to apply for data access is available at the following url: https://www.genomicsengland.co.uk/about-gecip/for-gecip-members/data-and-data-access/. All data shared in this manuscript were approved for export by Genomics England. The datasets and code supporting the current study have not been deposited in a public repository because the data are not public. Code showing data analysis on Genomics England data can be shared upon request within the Genomics England Research Environment.