Abstract
The majority of publicly available genomics data originates from populations of European ancestry. This limits understanding and detection of inherited genetic risk factors for breast cancer in other populations. To assess the extent to which deficits in knowledge of the genetics of breast cancer risk exist for populations of non-European ancestry, we compared data available on putative breast cancer risk variants in the ClinVar database for populations of different ancestry.
Protein-coding insertions and deletions (indels) and single-nucleotide polymorphisms (SNPs) private to populations of Non-Finnish European (NFE), African (AFR), Admixed American (AMR), East Asian (EAS) and South Asian (SAS) ancestry from the Genome Aggregation Consortium (gnomAD v4) were identified for nine established breast cancer risk genes. The percentage of private protein-coding variants listed as ‘Unreported’ by gnomAD in ClinVar were compared between populations.
The SAS population had the biggest knowledge deficit, as 43.4% of private SAS variants were not reported in ClinVar, compared to 20-30% for other populations. Proportionally fewer SAS variants were reported for all 9 genes, with the difference reaching an adjusted p < 0.05 for PALB2, ATM and BRCA2 when compared to NFE. In contrast, few genes had significantly lower ClinVar reporting rates for AFR, AMR and EAS than for NFE.
ClinVar reporting deficits in the SAS population were observed for both missense and protein-truncating variants. Unreported variants were usually very rare and largely absent in other public repositories. A substantial fraction of unreported variants were protein-truncating (17.2%), or missense with high predicted pathogenicity scores, representing novel candidate breast cancer risk alleles.
Our work demonstrates putative breast cancer risk variants from populations of South Asian ancestry are less likely to be reported in ClinVar. Defining and removing barriers to reporting potential risk variants for breast cancer from South Asian populations is needed to reduce this knowledge deficit.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This study was funded by the Victorian Cancer Agency, the Peter MacCallum Cancer Foundation and the Laby Foundation.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
Source data were openly available before the initiation of the study from the gnomAD database (https://gnomad.broadinstitute.org/)
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data Availability
Code and data files used in these analyses are available as Git repo at https://github.com/RaveenRony/ClinVar-annotation-rates





