ABSTRACT
Breast cancer is globally the leading type of cancer in terms of both incidence and mortality. BRCA1 and BRCA2 gene variants have long been linked to and studied in context of the disease. Rapid variant discovery has further been made freely accessible by advances in Next-generation sequencing, making it a demanding task to accurately interpret these variants for clinical and research applications. To establish the nature of these variants, the American College of Medical Genetics and Genomics and the Association of Molecular Pathologists (ACMG-AMP) have issued a set of guidelines for variant classification. However, given the huge number of variants associated with the two large and well-studied genes, functional studies or ACMG-AMP classification is a mountainous challenge. Here we describe brca-NOVUS, a machine learning approach trained on a gold-standard ACMG-qualified dataset for the accurate interpretation of variants at large scale. Using two independent test and validation datasets of ACMG-qualified variants, we show that brca-NOVUS can be used to for the classification of variants in clinical as well as research settings.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
Authors acknowledge funding from the Council of Scientific and Industrial Research (CSIR) through CNP-007 project. The funders had no role in the preparation of the manuscript or decision to publish.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data Availability
The source code of both our models is available at https://github.com/aastha-v/brca-NOVUS. The models have been standardised on Ubuntu 18 LTS. The instructions and code for the preprocessing pipeline is also included.