Large scale genome-wide association analyses identify novel genetic loci and mechanisms in hypertrophic cardiomyopathy

Hypertrophic cardiomyopathy (HCM) is an important cause of morbidity and mortality with both monogenic and polygenic components. We here report results from the largest HCM genome-wide association study (GWAS) and multi-trait analysis (MTAG) including 5,900 HCM cases, 68,359 controls, and 36,083 UK Biobank (UKB) participants with cardiac magnetic resonance (CMR) imaging. We identified a total of 70 loci (50 novel) associated with HCM, and 62 loci (32 novel) associated with relevant left ventricular (LV) structural or functional traits. Amongst the common variant HCM loci, we identify a novel HCM disease gene, SVIL, which encodes the actin-binding protein supervillin, showing that rare truncating SVIL variants cause HCM. Mendelian randomization analyses support a causal role of increased LV contractility in both obstructive and non-obstructive forms of HCM, suggesting common disease mechanisms and anticipating shared response to therapy. Taken together, the findings significantly increase our understanding of the genetic basis and molecular mechanisms of HCM, with potential implications for disease management.


Supplementary Figures and Note
Mendelian randomization (MR) analysis of systolic (SBP, panels A-C) and diastolic (DBP, panels D-F) blood pressure (exposure) on risk of hypertrophic cardiomyopathy (HCM, top panels A and D) and its obstructive (oHCM, middle panels B and E) and non-obstructive (nHCM, bottom panels C and F) forms (outcomes). Genetic instruments for SBP and DBP were selected from a published GWAS of blood pressure including up to 801,644 individuals. 6 The outcome HCM GWAS included 5,927 HCM cases vs.

HCM GWAS study cohorts
The HCM meta-analysis included 7 cohorts, described below and in Supplementary Table 1. this study were assessed centrally for pathogenicity by the Oxford laboratory using the classification framework previously reported in Neubauer et al. 8 Controls without a diagnosis of HCM were included from the MHI Biobank and the London Health Sciences Centre. All cases and controls were genotyped on the Illumina Global Screening Array (GSA). QC was performed as previously described 12 , followed by imputation with Minimac4 on the Michigan Imputation server using the Haplotype Reference Consortium r1.1. 13 A total of 1,035 cases and 13,889 controls were then included in the association analysis using SNPTEST v2.5 with the score method adjusting for the first 20 genotypic PCs. In addition to the primary all-comer analysis, a case-control analysis of sarcomere-positive (N=261) and sarcomere-negative (N=582) was performed using random allocation of controls.
Netherlands HCM cohort. The Netherlands cohort was previously described. 12  OmniExpress Beadchip. All downstream QC and data analyses were performed centrally. Data quality control was performed using Plink v1.9 and in-house scripts. We first mapped all SNPs to the positive strand of GRCh37 build. Pre-imputation SNP-level QC excluded SNPs with MAF<0.01, and Hardy-Weinberg equilibrium test P<10 -7 . Sample QC excluded samples with sex mismatch, heterozygosity rate >3 standard deviations (SD) from the mean, and missingness rate >0.05. We aligned study genotypes with those from the HapMap3 cohort, and used PC analysis to identify genotypically Caucasian samples to take forward for imputation, based on cut-offs defined by the mean and SD of each HapMap ethnicity cohort. Batches were merged pre-imputation, excluding SNPs which failed QC checks in any batch. We also checked for differential missingness and excluded any SNPs with differential missingness test P<1x10 -7 . We used the combined UK10K + 1000 Genomes Project dataset as the imputation reference panel. This has a proven record of accuracy for UK study cohorts, across a range of allele frequencies. 18 We pre-phased study genotypes using SHAPEIT 19 (v2.r790) and imputed the genotype data using IMPUTE2 10 (v2.3.2). We finally performed several post-imputation QC steps per-SNP and per-sample, excluding SNPs with an INFO criterion score <0.4, MAF<1%, Hardy-Weinberg equilibrium P<10 -7 . We then removed related individuals from the dataset, using a PI-HAT cut-off of 0.187. After the QC steps, we performed another PC analysis to generate the eigenvectors used as covariates in the regression analysis. Association test was performed by SNPTEST v2.5.4 with the Newton-Raphson method adjusted by sex, age, and top 3 PCs. Stratified analyses for sarcomerepositive (N=118) and sarcomere-negative (N=330) HCM cases using randomly allocated nonoverlapping controls were also performed.
Italian HCM cohort. Individuals with HCM were diagnosed or referred to the Cardiomyopathy Unit of the Careggi University Hospital (Florence, Italy). HCM diagnosis was defined as unexplained LV hypertrophy with a maximal LV wall thickness >13mm on echocardiography or CMR, complemented by family history and genotype to enable an informed diagnosis. Unrelated healthy Italian controls of European ancestry were obtained from the multicentre international HYPERGENES study. 20 A total of 1,293 control samples were selected, comprising 561 hypertensives (defined as diastolic blood pressure >=90mmHg, systolic blood pressure >=140mmHg, or on antihypertensive treatment before the age of 50) and 732 normotensives. As expected by the inclusion criteria, the mean age was significantly higher in normotensives than in hypertensives. Since aging is associated with an increased prevalence of hypertension, we selected exclusively hyper-normal controls older than 50year-old allowing for the exclusion of subjects that developed hypertension at a later age. Cases were genotyped at King's College London by Illumina OmniExpress Beadchip and controls were genotyped by Human1M-Duo v3.0 array. All downstream QC and data analyses were performed centrally. Data quality control was performed using Plink v1.9 and in-house scripts. We first mapped all SNPs to the positive strand of GRCh37 build. Pre-imputation SNP-level QC excluded SNPs with MAF<0.01, and Hardy-Weinberg equilibrium test P<10 -7 . Sample QC excluded samples with sex mismatch, heterozygosity rate >3 standard deviations (SD) from the mean, and missingness rate >0.05. We aligned study genotypes with those from the HapMap3 cohort, and used PC analysis to identify genotypically Caucasian samples to take forward for imputation, based on cut-offs defined by the mean and SD of each HapMap ethnicity cohort. Case and control batches were combined after preimputation steps. We also checked for differential missingness and excluded any SNPs with differential missingness test P<1x10 -7 . Genome wide imputation was performed using eagle v2.4 phasing, Minimac4 1.5.7 and HRC r1.1 reference panel implemented on the Michigan Imputation Server v1.2.4. 13 We finally performed a number of post-imputation QC steps per-SNP and per-sample, excluding SNPs with a Minimac R 2 <0.5, MAF<1%, Hardy-Weinberg equilibrium P<10 -7 . We then removed related individuals from the dataset, using a PI-HAT cut-off of 0.187. After the QC steps, we performed another PC analysis to generate the eigenvectors used as covariates in the regression analysis. Association test was performed by SNPTEST version 2.5.4 with the expected method adjusted by age, sex and top 10 PCs. Stratified analyses for sarcomere-positive (N=111) and sarcomere-negative (N=183) HCM cases using randomly allocated non-overlapping controls were also performed. Note that the total number of HCM cases included in the sarcomere status stratified analyses was higher than the number of cases included in the all-comer analysis which was completed at an earlier phase.
The BioResource for Rare Disease (BRRD). The BRRD cohort, a pilot study of the Genomics England 100,000 Genomes Project (GEL), has been described elsewhere. 7,21 In brief, following QC, 239 sarcomere negative HCM cases and 7,203 controls were available for analyses. Logistic regression analysis was performed using SAIGE (v0. 29.4.2) 16 with correction for the first three genotypic PCs.
Automated cardiac magnetic resonance (CMR) image analysis The LV traits GWAS was performed using CMR data from 39,559 UK Biobank participants (36,083 following genotypic and imaging QC and exclusions for co-morbidities). LV trait data were computed using a machine learning analysis of CMR images as described below.
We segmented short and long axis cine images using a fully convolutional network 22 , trained on manual annotations of 3,975 subjects, as previously reported. 12 All image segmentations were manually quality controlled by an experienced cardiologist. The segmentation screenshots for shortaxis and long-axis end-diastolic and end-systolic frames were visually inspected. Bad segmentations, images with insufficient coverage of the LV or missing anatomical structures were discarded. For motion tracking, subjects with failed image registration or outlier peak global strain values (positive circumferential or longitudinal strain, or negative radial strain) were discarded.
LV end-diastolic (LVEDV) and end-systolic (LVESV) volumes and ejection fraction (LVEF, defined as [LVEDV-LVESV]/LVEDV) were derived from these segmentations. The LV myocardial mass (LVM) was calculated from the myocardial volume using a density of 1.05 g/mL. LV concentricity (LVconc) was defined as LVM/LVEDV. The LV wall thickness (WT) was measured at end-diastole. The myocardium was divided into 16 segments, according to the American Heart Association nomenclature. 23 Maximum and mean wall thickness was derived for each AHA segment. Overall mean wall thickness (meanWT) per subject was calculated from the mean of each of the 16 mean wall thickness measurements. Overall maximum wall thickness (maxWT) was the maximum wall thickness measurement from any segment.
Motion tracking was performed using non-rigid image registration between successive timeframes, using the MIRTK toolkit. 24 Inter-frame displacement fields were composed to obtain the displacement with respect to a reference frame (the end-diastolic frame, or frame 0). To avoid drift effect due to accumulation of registration errors 25 , motion tracking is performed twice -along the forward direction (tracking starting from frame 0 to frames 1, 2, 3,…) and backward direction (tracking from frame 0 to frames T-1, T-2, T-3, …) where T denotes the total number of frames per cardiac cycle). The average displacement field is calculated by weighted averaging of the forward and backward displacement field such that, for a frame at the start of the cardiac cycle, the forward displacement field will have a higher weight, whereas for a frame at towards the end of the cardiac cycle, the backward displacement field will have a higher weight. Four image slices were used for this: in the short axis, a basal slice at 75% LV location, a mid-cavity location and an apical slice at 25% LV location, and in the long axis, a 4-chamber view as previously described. 26,27 The myocardial contours on the three slices were divided into the 16 segments. Based on the displacement field from motion tracking, myocardial contours at the end-diastolic frame were warped onto each time frame of the cardiac cycle. Circumferential, radial and longitudinal strains were calculated for each time frame based on the change of length for each line segment 28 , using the equation = ∆ , where E is the strain, ∆ is the change in length of the line segment, and L is the starting (end-diastolic) length of the line segment. Peak strain for each AHA segment, and global peak strain were calculated in radial (strain rad ), longitudinal (strain long ) and circumferential (strain circ ) directions. 28 Note that in contrast to strain rad , both strain long and strain circ are negative values where more negative values reflect higher contractility. As such, we will sometimes present results for -strain long and -strain circ to facilitate interpretation of directionality.

LV traits GWAS: genotyping and imputation
Details of the genotyping and QC strategy employed by UKB were previously published. 29 Genotypes were called from 2 purpose-built arrays: the UK Biobank Axiom Array (825,927 markers) and UK BiLEVE Axiom Array (807,411 markers). These directly called genotypes were imputed to more than 90 million variants using the haplotype reference consortium (HRC) and the UK10K reference panels.
See UKB website for details of imputation and genotype quality control performed by a collaborative group headed by the Wellcome Trust Centre for Human Genetics. We excluded samples with outlying heterozygosity or missingness rates, as defined by UKB and those with mismatches between the genotypic and recorded sex, as well as those with aneuploidy or excess kinship, as defined by UKB.