Differences in Cardiac Mechanics among Genetically At-Risk First-Degree Relatives: The DCM Precision Medicine Study

Aims: Among genetically at-risk first-degree relatives (FDRs) of probands with dilated cardiomyopathy (DCM), the ability to detect changes in left ventricular (LV) mechanics with normal LV size and ejection fraction (LVEF) remains incompletely explored. We sought to define a pre-DCM phenotype among at-risk FDRs, including those with variants of uncertain significance (VUSs), using echocardiographic measures of cardiac mechanics. Methods and Results: LV structure and function, including speckle-tracking analysis for LV global longitudinal strain (GLS), were evaluated in 124 FDRs (65% female; median age 44.9 [IQR: 30.6–60.3] years) of 66 DCM probands of European ancestry sequenced for rare variants in 35 DCM genes. FDRs had normal LV size and LVEF. Negative FDRs of probands with pathogenic or likely pathogenic (P/LP) variants (n=28) were a reference group to which negative FDRs of probands without P/LP variants (n=30), FDRs with only VUSs (n=27), and FDRs with P/LP variants (n=39) were compared. In an analysis accounting for age-dependent penetrance, FDRs below the median age showed minimal differences in LV GLS across groups while those above it with P/LP variants or VUSs had lower absolute values than the reference group (−3.9 [95% CI: −5.7, −2.1] or −3.1 [−4.8, −1.4] %-units) and negative FDRs of probands without P/LP variants (−2.6 [−4.0, −1.2] or −1.8 [−3.1, −0.6]). Conclusions: Older FDRs with normal LV size and LVEF who harbored P/LP variants or VUSs had lower absolute LV GLS values, indicating that some DCM-related VUSs are clinically relevant. LV GLS may have utility for defining a pre-DCM phenotype. Clinical Trial Registration: clinicaltrials.gov, NCT03037632

versions were removed. No subjects with genotype call rates <97% among remaining SNPs had to be removed. One SNP having Mendel errors in more than 5% of nuclear families was excluded; no nuclear families had to be removed due to Mendel errors in more than 1% of SNPs.
Genotypes for any remaining Mendel errors were set to missing.

Principal Components Analysis
Subsequent quality control steps followed published recommendations to use ancestryadjusted quality control metrics in diverse cohorts. 3 The 1000 Genomes Phase 3 integrated callset 4 available at http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ was used to obtain ancestry principal components onto which DCM Research Project samples could be projected for subsequent steps. All analyses using R or Bioconductor packages were performed in R version 4.1.1 (R Foundation).

Joint Fileset Assembly
High-quality, common autosomal SNPs passing previous QC steps with call rates of at least 99.5% in probands and minor allele frequencies (MAFs) ≥5% in probands were selected The resulting compressed BCFs were then concatenated with bcftools into a single compressed BCF. The compressed BCF was then converted to a PLINK binary fileset and merged with the GSA ancestry SNP fileset using PLINK to create a joint fileset with 4,033 individuals and 267,947 SNPs.

Identification of Unrelated Subjects
The KING-robust algorithm 5 implemented in the SNPRelate Bioconductor package 6 (version 1. 28.0) was used to estimate kinship coefficients between pairs of individuals in the joint fileset. KING-robust does not assume linkage equilibrium between markers and is robust to departures from Hardy-Weinberg equilibrium (HWE) in the direction of excess heterozygosity; 5 the KING manual specifically recommends against performing any type of linkage disequilibrium (LD) pruning.
Only 1000 Genomes samples that were unrelated to every DCM sample in this joint fileset (kinship coefficient <0.025) 7 were used for ancestry analyses. This served to ensure that none of the DCM Research Project samples were related to the 1000 Genomes samples used as a reference so that the online augmentation, decomposition, and Procrustes approach could be used for projection of our study samples onto the 1000 Genomes principal components. 8,9 Among the remaining 1000 Genomes samples, the PC-AiR algorithm 7 implemented in the pcairPartition function of the GENESIS Bioconductor package (version 2.24.2) was used to select an unrelated subset with the most ancestral divergence. We used the kinship cutoff given above and a divergence cutoff of -0.025 as recommended. 7 We used the same PC-AiR approach to identify an unrelated subset of DCM Research Project probands for a subsequent structured HWE analysis.
After filtering, there were 2,422 subjects from 1000 Genomes unrelated to each other and any of the eligible DCM Research Project subjects who could be used for structured HWE and ancestry analysis and 1,266 unrelated DCM Research Project probands who could be used for structured HWE analysis. Note that these probands include individuals enrolled in other DCM Research Project studies who underwent exome sequencing.

Structured Hardy-Weinberg Equilibrium Testing
The structured HWE method of Hao and Storey (2019) 10 implemented the lfa Bioconductor package tests for deviation from HWE after removing the effects of population structure and was used to test SNPs that could potentially be used in ancestry analysis for deviation from HWE. A recent version on GitHub (2.0.8; d2ced17), rather than the one on Bioconductor, was used to enable BEDMatrix (version 2.0.3) support. These tests were performed separately in unrelated probands and unrelated 1000 Genomes reference samples to identify poor-quality SNPs for each genotyping technology (GSA or 1000 Genomes integrated calling). The sHWE function from lfa was not used directly but was reimplemented based on the source code in a way that improved memory efficiency and parallelism.
The number of logistic factors, , was selected adaptively by the p-value entropy approach in Algorithm 2 of Hao and Storey (2019) with 150 bins. 10 In particular, was incremented and the p-value entropy among all but the smallest bin calculated. Once the entropy remained at or below the value with logistic factors for + 1 and + 2 logistic factors, analysis proceeded with logistic factors. P-values were calculated by simulating data for each SNP under the null model given the estimated individual-specific allele frequencies and calculating the test statistic, which effectively gave a bootstrap sample of size equal to the number of markers under the null hypothesis because the test statistic is a pivotal quantity. 10 Empirical p-values were calculated using the formula that counts the observed test statistic in both the numerator and denominator. 11 The structured HWE p-values obtained with this were used to estimate SNP q-values as well as the proportion of SNPs in HWE, 0 , using the bootstrap method implemented in the Bioconductor qvalue package (version 2.26.0). 10,12 Strong evidence of deviation from HWE was defined as having a q-value ≤0.05 because no more than 5% of SNPs meeting this threshold would be expected to actually be in HWE, including in the presence of local dependence induced by LD. 12 While no SNPs met the q-value threshold in unrelated DCM Research Project probands, a total of 4,261 SNPs with relatively strong evidence of deviation from HWE in unrelated 1000 Genomes reference samples were eliminated for subsequent ancestry analyses, leaving 263,686 SNPs.

Principal Components Analysis of 1000 Genomes Reference Samples
Principal components analysis (PCA) was performed on the 2,422 1000 Genomes reference samples found to be unrelated to each other and all DCM Research Project samples.
The bigsnpr R package (version 1.10.8) and its dependencies were used to perform the analysis. 9 SNPs located in long-range LD regions identified in Table S13  Outlier samples in the PC score space were identified using Probabilistic Local Outlier Factor approach. 9 We used PC scores scaled to have the same variance ( from the SVD) 16 rather than the PC scores themselves ( from the SVD), although the latter are the coordinates actually used when plotting individuals in PC space. This scaling ensures that each PC has equal weight in sample outlier calculations rather than giving large weights to the first few PCs.
Cutpoints for the statistic were chosen based upon visual inspection of the histogram and pairwise PC score plots in each iteration.

Projection of DCM Research Project Samples onto 1000 Genomes Principal Components
The Online Augmentation, Decomposition, and Procrustes (OADP) approach 8 as implemented in the bed_projectSelfPCA function of the bigsnpr package was used to obtain predicted PC scores for DCM Research Project samples on the 1000 Genomes PCs. Advantages of using projected scores based on OADP include: 1) Predicted PC scores for an individual are the same regardless of who is included in the sample for prediction, which obviates the need to perform a second PCA after sample quality control and allows for easy addition of additional subjects without changing prior results.
2) Projection can be used to obtain PC scores for related individuals. 7 With OADP, relatedness in the study samples should not impact prediction accuracy as long as reference samples are unrelated to each other 8 and the study samples. 9 3) OADP eliminates centroid bias in predicted PC scores that can occur with standard techniques. 8,9 Ancestry-Adjusted Quality Control On the basis of the 1000 Genomes PCA, adjustment for 20 PCs in DCM Research Project samples, all of which should be from populations included in the 1000 Genomes reference sample or mixtures of them, was deemed sufficient to remove the effects of population structure for ancestry-adjusted quality control measures.

Autosomal Heterozygosity
In order to identify and remove potentially poor quality samples in terms of heterozygosity, an approach to account for ancestry similar to the one detailed in Bycroft et al. A total of 8 projected DCM Research Project samples were considered outliers due to high autosomal heterozygosity. Six samples had evidence of recent ancestral admixture that would explain the high heterozygosity, but the remaining 2 were removed from further processing. The expected negative relationship between PC-corrected heterozygosity and the total length of all runs of homozygosity was confirmed; thus, no samples were excluded for having unusually low heterozygosity.

Sex Discordance
To check for potential sample swaps based on discordance from reported sex, X chromosome heterozygosity and the Y chromosome genotype missing rate were computed for the remaining projected samples. Raw X chromosome heterozygosity was plotted against the Y chromosome genotype missing rate with samples faceted by self-reported biological sex, and the plot was visually inspected to identify any potential sex-related discordance. There appeared to be no sex discordance among the samples considered.

Expected Relatedness
As the DCM Research Project studies are family-based, many families had genotype data on multiple members included in the sample set. To verify that the pedigree-based relatedness between samples was correct, identify potential sample swaps, and identify cryptically related  [19][20][21] Note that the ADMIXTURE likelihood, like PCA, assumes unrelated reference samples, 19 and inclusion of related samples can lead to lower-quality ancestral allele frequency estimates. 21 As ADMIXTURE also assumes linkage equilibrium between markers, the LD-pruned SNP set from the final PCA was used, as recommended. 19 ADMIXTURE relies on the calling user to provide a value, , which represents the number of ancestral populations believed to be among the input samples.
For a given , the ADMIXTURE likelihood has ! equivalent global maxima in which the ancestry proportions and allele frequencies are identical but appear in different columns of and rows of . 19 As ADMIXTURE uses a greedy, hill-climbing search to maximize the likelihood, 30 runs from different starting values were performed for each by varying the random seed 22 to have a greater chance of achieving a well-supported global maximum. The maximal runs with a maximized loglikelihood numerically equivalent to the maximum achieved across all runs (i.e., absolute value of relative difference <3e-8) for a given are likely to represent these equivalent global maxima as opposed to a local maximum attained due to poor initial values. To verify that the maximal runs represented the same solution up to a permutation of the columns of , pong (version 1.5) 23 was used to find the maximal weight alignments of columns in for these runs and calculate their similarity for a given . If the matrices for the maximal runs in a given clustered into a single mode with high similarity, then the maximal runs were equivalent, and the putative global maximum was well-identified.
To determine the optimal , ADMIXTURE's v-fold cross-validation procedure 20 with

Variant Analysis and Interpretation
Research exome sequencing was conducted and data processed as described previously 24 with the following modifications. The pipeline used by the NWGC was updated to use newer tailored to DCM. 24 As part of this process, a computational algorithm first attempted to classify variant as benign, likely benign, unlikely to impact protein function, or low quality. 24 While the original algorithm used both QDFilter and SNPCluster VCF FILTER flags to classify variants as low quality, the SNPCluster filter was later eliminated due to false positives.
Variants that could not be computationally adjudicated were manually reviewed and certified for a final classification and confirmed with Sanger sequencing. 24  As some of these changes were fully implemented after generation of the sampling frame, a total of 3 first-degree relatives (FDRs) who harbored only variants of uncertain significance (VUSs) in the sampling frame were reclassified as harboring pathogenic or likely pathogenic (P/LP) variants for the final analysis.

Supplemental Figures
Supplemental Figure 1. Echocardiographic measurements by genetic risk group in first-degree relatives below the median age in the sample A violin plot with a superimposed box-and-whisker plot shows the distribution of the measurements among first-degree relatives (FDRs) in each genetic risk group, with a black diamond at the mean. Next to this, an interval plot shows the estimated marginal mean from the linear mixed model analysis (point) as well as its 95% confidence interval (interval) obtained using Morel-Bokossa-Neerchal bias-corrected empirical standard errors and the standard normal distribution. For each genetic risk group, the estimated marginal mean is a covariate-adjusted estimate of the mean in a population of FDRs below the median age in the sample (44.9 years) that is half female. For LV global longitudinal strain, these populations also have the same mean height and weight and half image quality >2. Table 3 in the main text presents covariate-adjusted estimated mean differences between each genetic risk group and the reference group for FDRs below the median age from the same model; these are identical to the differences between the Supplemental Figure 3. Estimated global ancestry proportions for 1000 Genomes reference samples from the maximal run for the optimal number of assumed ancestral populations

Supplemental Tables
Supplemental Table 1. Demographic characteristics and comorbidities of first-degree relatives, by genetic risk group and age group    Posterior wall thickness at end diastole, mm The mean of each echocardiographic measurement was modeled as a function of genetic risk group within two age groups (below and above the median age in the sample) using a single linear mixed model with an interaction. This model specification was chosen a priori to reflect expected growth in the differences between genetic risk groups with age under a threshold model for DCM development that accounts for age-dependent penetrance. For all measurements other than sex-specific internal diameter z-scores, sex and its interaction with age group were included as covariates. For LV global longitudinal strain and LV longitudinal strain (A4C view), height, weight, and image quality rating (≤2 vs. >2) were also included. Heterogeneity between clinical sites and intrafamilial correlation were modeled by including independent normal random effects for proband enrollment site and family within site. Covariate-adjusted estimated differences in means between each positive FDR group (LP/P FDRs and VUS FDRs) and negative FDRs of probands without P/LP variants, their 95% confidence intervals, and two-sided Wald p-values for the null of no difference were obtained from this model using Morel-Bokossa-Neerchal bias-corrected empirical standard errors and the standard normal distribution. Estimated marginal means for each genetic risk group derived from this model are shown in Figure 3 (FDRs above the median age) and Supplemental Figure 1 (FDRs below the median age). Table 2 shows the number of FDRs contributing to each measurement's model by genetic risk group. c The final model excluded the site random effect because convergence occurred on the boundary constraint with a zero variance component when it was included. Morel-Bokossa-Neerchal bias-corrected empirical standard errors with sites as independent units were still used. d The final model excluded the site and family-within-site random effects because convergence occurred on the boundary constraint with zero variance components when they were included. Morel-Bokossa-Neerchal biascorrected empirical standard errors with sites as independent units were still used.