A rare splice-site variant in cardiac troponin-T (TNNT2): The need for ancestral diversity in genomic reference datasets

The underrepresentation of different ancestry groups in large genomic datasets creates difficulties in interpreting the pathogenicity of monogenic variants. Genetic testing for individuals with non-European ancestry results in higher rates of uncertain variants and a greater risk of misclassification. We report a rare variant in the cardiac troponin T gene, TNNT2; NM_001001430.3: c.571-1G>A (rs483352835) identified via research-based whole exome sequencing in two unrelated probands of Oceanian ancestry with cardiac phenotypes. The variant disrupts the canonical splice acceptor site, activating a cryptic acceptor and resulting in an in-frame deletion (p.Gln191del). The variant is rare in gnomAD v4.0.0 (13/780,762; 0.002%), with the highest frequency in South Asians (5/74,486; 0.007%) and has 16 ClinVar assertions (13 diagnostic clinical laboratories classify as variant of uncertain significance). There are at least 28 reported cases, many with Oceanian ancestry and diverse cardiac phenotypes. Indeed, among Oceanian-ancestry-matched datasets, the allele frequency ranges from 2.9-8.8%, and is present in 2/4 (50%) Indigenous Australian alleles in Genome Asia 100K, with one participant being homozygous. With Oceanians deriving greater than 3% of their DNA from archaic genomes, we found c.571-1G>A in Vindija and Altai Neanderthal, but not the Altai Denisovan, suggesting an origin post Neanderthal divergence from modern humans 130-145 thousand years ago. Based on these data, we classify this variant as benign, and conclude it is not a monogenic cause of disease. Even with ongoing efforts to increase representation in genomics, we highlight the need for caution in assuming rarity of genetic variants in largely European datasets. Efforts to enhance diversity in genomic databases remain crucial.


ABSTRACT
The underrepresentation of different ancestry groups in large genomic datasets creates difficulties in interpreting the pathogenicity of monogenic variants.Genetic testing for individuals with non-European ancestry results in higher rates of uncertain variants and a greater risk of misclassification.
We report a rare variant in the cardiac troponin T gene, TNNT2; NM_001001430.3:c.571-1G>A (rs483352835) identified via research-based whole exome sequencing in two unrelated probands of Oceanian ancestry with cardiac phenotypes.
The variant disrupts the canonical splice acceptor site, activating a cryptic acceptor and resulting in an in-frame deletion (p.Gln191del).The variant is rare in gnomAD v4.0.0 (13/780,762; 0.002%), with the highest frequency in South Asians (5/74,486; 0.007%) and has 16 ClinVar assertions (13   diagnostic clinical laboratories classify as variant of uncertain significance).There are at least 28 reported cases, many with Oceanian ancestry and diverse cardiac phenotypes.Indeed, among Oceanian-ancestry-matched datasets, the allele frequency ranges from 2.9-8.8% and is present in 2/4 (50%) Indigenous Australian alleles in Genome Asia 100K, with one participant being homozygous.
With Oceanians deriving greater than 3% of their DNA from archaic genomes, we found c.571-1G>A in Vindija and Altai Neanderthal, but not the Altai Denisovan, suggesting an origin post Neanderthal divergence from modern humans 130-145 thousand years ago.Based on these data, we classify this variant as benign, and conclude it is not a monogenic cause of disease.Even with ongoing efforts to increase representation in genomics, we highlight the need for caution in assuming rarity of genetic variants in largely European datasets.Efforts to enhance diversity in genomic databases remain crucial.

INTRODUCTION
Inherited cardiomyopathies such as hypertrophic cardiomyopathy (HCM), dilated cardiomyopathy (DCM) and restrictive cardiomyopathy (RCM) cumulatively affect ~1 in 200-500 in the general population. 1 These conditions often have an autosomal dominant inheritance pattern with marked clinical and genetic heterogeneity.Some patients report only mild symptoms, while others experience more severe outcomes such as heart failure or sudden cardiac death (SCD).Genetic testing of genes with established clinical validity can identify the causal variant in approximately half of these patients and is part of mainstream clinical management. 2 Genetic variations encoding the cardiac sarcomere proteins, such as TNNT2 (troponin T), are an important cause. 2,3TNNT2 encodes cardiac troponin T, an integral protein in the cardiac sarcomere.The troponin complex and calcium regulate the binding of actin and myosin.TNNT2 is definitively associated with HCM and DCM. 4,5While causative variants in TNNT2 have been reported in cases of RCM, the RCM gene-disease association has not been formally curated.
The underrepresentation of individuals from diverse ancestry groups in population genomic datasets creates challenges in interpreting potentially causal monogenic variants. 6As a result, genetic testing has a lower diagnostic yield for people not of European ancestry due to more variants being considered of uncertain significance. 7,8In the context of rare monogenic diseases, population frequency is important when considering whether a variant is causal, with disease prevalence, penetrance, and known genetic heterogeneity informing an acceptable allele frequency for pathogenicity in a specific disease. 9,10However, while a genetic variant may appear to be rare in large population reference databases such as the Genome Aggregation Database (gnomAD), 11 if the person being tested has a genetic ancestry that is not well represented, then there should be caution in considering whether the rarity criterion is met. 12 There is a concerted international effort to increase ancestral diversity in genomics, with gnomAD v4.0.0 now including an additional 138,000 individuals from previously underrepresented backgrounds and refined recognition of genetic ancestry groups. 13However, individuals with genetic ancestry derived from Oceania remain virtually absent from genomic databases.Oceania is a geographical region on the Pacific Ocean comprising Australasia, Melanesia, Micronesia and Polynesia. 13The history of modern humans in Oceania comprises two vastly divergent ancestral populations.5][16] The second closely relates to mainland East Asians, specifically Indigenous Taiwanese, who populated remote Oceania ~3,000 years ago. 16,17 report a variant, TNNT2; NM_001001430.3:c.571-1G>A (rs483352835), in two unrelated probands with cardiac phenotypes and Oceanian ancestry.Given its relative rarity in available population databases but the prevalence of Oceanian ancestry in reported cases, we aimed to evaluate it as a cause of monogenic cardiomyopathy.
. METHODS TNNT2; NM_001001430.3:c.571-1G>A (rs483352835) was identified in two unrelated probands seen at a specialised genetic heart disease clinic, Sydney, Australia, who consented to research-based whole genome sequencing (Royal Prince Alfred Hospital Sydney Local Health District Human Research Ethics Committee X15-0089).The variant classification was performed by research genetic counsellors using the MYH7-modified ACMG/AMP criteria and discussed at a dedicated cardiac genetic multidisciplinary team meeting. 9 identify probands with this variant, literature and ClinVar reports for the c.571-1G>A (rs483352835) variant were sought, and clinical laboratories and research groups were contacted for additional information about patients' cardiac phenotypes and ancestry.
The frequency of the variant in open-access genomic databases, including gnomAD v3.12 and v4.0.0,Genome Asia 100K (Oceania), and the Regeneron Genetics Center Mexico City Prospective Study Browser, n=140,000, was accessed.We then contacted research groups with ancestry-matched participants with non-cardiac diseases and available genomic data.This included the Taiwan Biobank (n=1517), which consists primarily of Han Chinese and does not include any Indigenous Taiwanese individuals; 18 the Pacific Islands Rheumatic Heart Disease Genetics Network (n=3234) which consists of low-coverage genomes and SNP arrays with imputation 19

Patient summary
Proband 1 was a male of Oceanian ancestry who suffered an SCD in his early 20s with no cause identified following a comprehensive post-mortem investigation.Proband 2 was a 0-3 year-old child of Oceanian ancestry with RCM who died from rapidly progressive heart failure; in addition to the TNNT2 splice variant, a pathogenic variant in TNNI3 classified pathogenic as per ACMG guidelines was also identified.
We queried ClinVar and the literature to establish if the variant had been reported previously.There were 16 ClinVar entries for this variant (ClinVar variation ID: 132940), updated between December 2014 to January 2024.Fourteen entries classify this variant as being of uncertain significance due to haploinsufficiency not being established as a disease mechanism for TNNT2.One proposed this variant as likely pathogenic in 2010, primarily due to the absence of TNNT2 splice variants among available controls.The final entry gives no classification.Of the 16 submitters, 13 are diagnostic genetic laboratories.

Case data
Case data were collated from ClinVar entries, biomedical literature, and specialized cardiovascular genetic disease centers, giving 26 unrelated probands with this variant (excluding our two probands).
Of these, 24 had a known cardiac phenotype, including 15 with HCM (including two diagnosed on autopsy following SCD), five with DCM (two presented at less than three years of age), 5 three with sudden unexplained death, and one individual with high-burden premature ventricular contractions, syncope, and a family history of SCD.One individual lacked a specific diagnosis but had a family history of SCD, and in another, the diagnosis was not available.Four probands reported a family history of SCD in a close relative.Evidence of co-segregation with disease in families was reported for one family with HCM, with the variant identified in two affected relatives.At least four probands .(15%) had an additional pathogenic variant that explained their cardiac phenotype.Ancestry was available for 22 probands, of which 21 probands were reported as having Oceanian ancestry, including Aotearoa New Zealand Māori, Samoan, Tongan, Polynesian, Pacific Islander, Australian Aboriginal and Torres Strait Islander, and Hawaiian (Figure 2).

Variant summary
TNNT2 exon numbering is in reference to the NM_001001430.3transcript.The c.571-1G>A (rs483352835) variant disrupts the canonical acceptor splice site of exon 12, an alternatively spliced exon that contains just three amino acids and is expressed in 7/13 of NCBI RefSeq transcripts 21 (Figure 1A).SpliceAI 22 predicts the c.571-1G>A (rs483352835) variant abolishes the canonical acceptor and strengthens a cryptic acceptor three nucleotides downstream (Figure 1B).RNA studies have shown that the variant results in activation of the cryptic acceptor, leading to an in-frame deletion of one amino acid, p.Gln191del. 4 The activated cryptic acceptor is an annotated acceptor splice site in NM_001001432.3(Figure 1C), and use of this acceptor has been observed in 25% of RNA sequencing samples in SpliceVault, a dataset comprised of Genotype-Tissue Expression (GTEx) data and Sequence Read Archive (SRA) data. 23Proportion expressed across transcripts (pext) scores 24 suggest that exon 12 expression is reduced relative to other exons in TNNT2, supporting alternative splicing of this exon.The three amino acids encoded by this exon are weakly conserved, with no pathogenic/likely pathogenic missense variants reported in ClinVar.These data suggest that the p.Gln191del is a common, tolerated event.The c.571-1G>A (rs483352835) variant is likely to increase the proportion of transcripts with p.Gln191del, an event likely to have minimal impact on TNNT2 expression and/or function.

Frequency in population databases
Allele frequencies are shown in Table 1.Among publicly available databases, TNNT2:c.571-1G>A(rs483352835) was present in gnomAD v4.0.0 in 13/780,762 (0.002%) alleles, with the highest sub-population frequencies among South Asian (5/74,486; 0.007%) and East Asian (1/41,260; 0.002%) alleles. 11Within Genome Asia 100K Project, the variant was present in 6/148 (4.1%) chromosomes in the Oceanian population, with a frequency of 2/4 (50%) chromosomes in the 2 Australian participants (one homozygote), and 4/140 (2.9%) in Papuan alleles (Figure 3). 25 The variant was not present in the Regeneron Genetics Center Mexico City Prospective Study Browser (0/301,992; 0%). 26 determined the presence of the variant within large biomedical research studies, including All of Us and the UK Biobank.In All of Us (n=245,388), the variant was found in fewer than 20 individuals, none of whom reported a diagnosis of cardiomyopathy, noting that occurrences <20 must be reported as such.Among these carriers, 59% were identified as Native Hawaiian or Other Pacific Islander. 27In the UK Biobank, out of 469,835 individuals who underwent exome sequencing, only one participant, who identified as belonging to the "Other ethnic group" and was born in South-East Asia, had the variant.Notably, this participant did not have cardiomyopathy.Among ancestry-relevant population sequencing data unavailable in the public domain, the variant was absent in the Taiwanese Biobank (0/3032; 0%).Within the Pacific Islands Rheumatic Heart Disease Genetics Network, the variant was present at a frequency of 0.088 (8.8%) within the Polynesian sub-group and 0.035 (3.5%) across the wider group, including Melanesians and South Asians.In a genome sequencing dataset (low-pass sequencing followed by imputation), 20 of participants of Aotearoa New Zealand Māori and Pacific Islander ancestry recruited to a Health and Disability study, the variant was present at an allele frequency of 4.0% in East Polynesian individuals (n=139), and 3.6% among West Polynesian individuals (n=55) (Figure 3).

Presence in archaic genomes
Of note, indigenous Papuan and Australian people derive greater than 3% of their DNA from Neanderthal ancestry, 28 which is a higher percentage than for Eurasian populations. 28,29TNNT2:c.571-1G>A(rs483352835) was shown to be present in two archaic genomes (Vindija and Altai Neanderthal), but not the Altai Denisovan, [28][29][30][31] suggesting it might have arisen ≥130-145 thousand years ago after the Neanderthal populations diverged from modern humans. 20This likely explains the increased frequency of TNNT2:c.571-1G>A(rs483352835) in Oceanian populations.

Benign variant classification
The ACMG/AMP guidelines state that a minor allele frequency of greater than 0.05 for an autosomal dominant gene can be considered stand-alone evidence of benign impact (BA1 criterion). 32Taking into consideration the allele frequencies among individuals with Oceanian ancestry, we applied the BA1 criterion and classify TNNT2:c.571-1G>A(rs483352835) as a benign variant, which is predicted to lead to a tolerated in-frame deletion of a single amino acid.

DISCUSSION
The limited representation of individuals with non-European ancestry in genetic studies and genomics reference datasets poses challenges in accurately classifying and interpreting genetic variants identified in underrepresented and Indigenous populations.This disparity results in a 2-3 times higher risk of uncertain variant identification following genetic testing for non-European individuals, 33,34 compromising the utility of such testing and increasing the likelihood of variant misclassification. 35 demonstrate the importance of publicly accessible ancestry-matched populations in enabling accurate variant interpretation, thereby avoiding potential harms that would disproportionately impact individuals from historically underrepresented ancestries.In this example, the TNNT2:c.571-1G>A(rs483352835) genetic variant has been frequently identified by testing laboratories and in combination with numerous case reports, could be misinterpreted as causative of disease.Indeed, many diagnostic laboratory summaries in ClinVar note the lack of evidence of TNNT2 haploinsufficiency as a disease mechanism as the barrier to considering this variant as causative.
Contemporary variant classification evaluates all reported variant cases under assessment and considers the individuals' clinical phenotype.Our experience highlights the need for patient ancestry to be a key data field in these assessments.Due to the scarcity of ancestry-matched reference data for Oceanians, our only course of action was to identify and personally contact laboratories and research groups with sequencing data from Oceanian peoples, ultimately revealing that the variant is too common to be considered disease-causing.As a result, despite some evidence that might support pathogenicity, the ancestry-matched allele frequency indicates this to be a benign variant.Given that 14 ClinVar assertions from clinical diagnostic laboratories report this as a VUS, this classification downgrade will ensure it is not reported and any future time or resources aimed at reclassification are carefully considered.While we cannot exclude a potential modifier role in disease, this variant is not a highly penetrant monogenic cause of cardiomyopathy but is a common single nucleotide variant within Oceanian populations.
. Efforts to increase ancestral diversity in genomics have produced larger and more inclusive publicly available reference databases, yet there is a considerable way to go in achieving global representation. 13,20,36Efforts to improve representation in genomic databases rely heavily on effective community engagement, and partnerships with underrepresented groups are essential. 37Open sharing of these data in ways that comply with participant consent, ethical standards and community governance will enable a more accurate understanding of the genetic basis of health and disease. 38ile there are important considerations when requesting patients to disclose their ancestry, in the setting of genetic testing and variant interpretation, these data are critical for avoiding misclassification.Moreover, in cases where this information is unavailable, inferring ancestry from genetic tests becomes essential, as it can effectively prevent the generation of undue anxiety in patients and help mitigate the risk of unnecessary medical interventions. 39

CONCLUSION
We highlight the challenges of interpreting 'rare' monogenic variants among individuals from poorly represented ancestry groups.Our analysis of TNNT2:c.571-1G>A(rs483352835), a common single nucleotide variant within Oceanian populations, emphasizes the critical need for openly accessible, large and diverse genomic reference databases to ensure accurate interpretation of variants and the value of genetic testing.While achieving ancestral diversity will require significant time, commitment, and resources, including deep community engagement and addressing existing barriers to research participation and mistrust, 40 and includes participants from Melanesia (Vanuatu, New Caledonia, Fiji), West Polynesia (Tonga, Samoa), and East Polynesia (Cook Islands, French Polynesia); and a group of 194 people of self-reported Aotearoa New Zealand Māori or Pacific Islander ethnicity who had previously been recruited and consented according to the Health and Disability Ethics approval (MEC/05/10/130) were genotyped to determine the allele frequency of the TNNT2 c.571-1G>A (rs483352835) variant in these populations.Genotyping was carried out by sequencing as described inEmde et al. 2021. 20

Figure 1 :
Figure 1: A: NCBI RefSeq transcripts for the TNNT2 gene.Exons are numbered as per NM_001001430.3.B: SpliceAI predicts weakening of the canonical acceptor splice site in exon 12 and strengthening of a cryptic acceptor splice site 3 nucleotides downstream, resulting in deletion of Gln191.There is no significant change in SpliceAI score for the exon 12 canonical donor splice site.C: The cryptic acceptor is the annotated acceptor splice site for NM_001001432.3.

TABLE 1 : Frequency of Variant TNNT2; NM_001001430.2: c.571-1G>A (rs483352835) in genomic reference databases
it is a necessary step if we are to ensure the entire population can benefit equitably from genomic medicine.participants was granted by Melbourne Health Human Research Ethics Committee and New Zealand Ethics Committee.Ethical approval with a waiver of consent was granted by Stanford School of Medicine, USA; Pennsylvania University Medical Center, USA.All other data were in the public domain.The UK Biobank analysis (National Research Ethics Service, 11/NW/0382) was conducted under the terms of access of project 47602.The All of US analysis (v7) was conducted through JI receives research grant support from Bristol Myers Squibb unrelated to this work.CC is an employee of and has stock options in Genome Medical.VP receives research support from BioMarin Inc. and serves as consultant and/or scientific advisor for BioMarin, Inc. and Lexeo Therapeutics.JW has consulted for MyoKardia, Inc., Pfizer, Foresite Labs, and Health Lumen, and receives research support from Bristol Myers-Squibb.None of these activities are directly related to the work presented here.All other authors report no conflicts of interest.38.Hudson, M. et al.Rights, interests and expectations: Indigenous perspectives on unrestricted access to genomic data.Nat.Rev. Genet.21, 377-384 (2020).39.Appelbaum, P. S. et al.Is there a way to reduce the inequity in variant interpretation on the basis . .