Machine learning suggests polygenic contribution to cognitive dysfunction in amyotrophic lateral sclerosis ========================================================================================================== * Katerina Placek * Michael Benatar * Joanne Wuu * Evadnie Rampersaud * Laura Hennessy * Vivianna M. Van Deerlin * Murray Grossman * David J. Irwin * Lauren Elman * Leo McCluskey * Colin Quinn * Volkan Granit * Jeffrey M. Statland * Ted M. Burns * John Ravits * Andrea Swenson * Jon Katz * Erik Pioro * Carlayne Jackson * James Caress * Yuen So * Samuel Maiser * David Walk * Edward B. Lee * John Q. Trojanowski * Philip Cook * James Gee * Jin Sha * Adam C. Naj * Rosa Rademakers * The CReATe Consortium * Wenan Chen * Gang Wu * J. Paul Taylor * Corey T. McMillan ## Abstract Amyotrophic lateral sclerosis (ALS) is a multi-system disease characterized primarily by progressive muscle weakness. Cognitive dysfunction is commonly observed in patients, however factors influencing risk for cognitive dysfunction remain elusive. Using sparse canonical correlation analysis (sCCA), an unsupervised machine-learning technique, we observed that single nucleotide polymorphisms collectively associate with baseline cognitive performance in a large ALS patient cohort (N=327) from the multicenter Clinical Research in ALS and Related Disorders for Therapeutic Development (CReATe) Consortium. We demonstrate that a polygenic risk score derived using sCCA relates to longitudinal cognitive decline in the same cohort, and also to *in vivo* cortical thinning in the orbital frontal cortex, anterior cingulate cortex, lateral temporal cortex, premotor cortex, and hippocampus (N=90) as well as *post mortem* motor cortical neuronal loss (N=87) in independent ALS cohorts from the University of Pennsylvania Integrated Neurodegenerative Disease Biobank. Our findings suggest that common genetic polymorphisms may exert a polygenic contribution to the risk of cortical disease vulnerability and cognitive dysfunction in ALS. ## Introduction A significant proportion of patients with amyotrophic lateral sclerosis (ALS) manifest impairment in cognition consistent with extra-motor frontal and temporal lobe neurodegeneration, including 14% also diagnosed with frontotemporal dementia (FTD) [1,2]. Comorbid cognitive dysfunction is a marker of poorer prognosis in this fatal disease and confers risk for more rapid functional decline, shorter survival, and greater caregiver burden [3-6]. While linkage analysis and genome-wide association studies (GWAS) have identified rare causal mutations [7-10] and common risk loci [11-15] suggesting shared genetic architecture between ALS and FTD, whether and how identified variants relate to phenotypic heterogeneity, including in cognition, remain largely unexplored. The genetic landscape of ALS is largely characterized by ‘apparently sporadic’ disease occurring in 90% of patients with no known family history of ALS and only a small proportion of approximately 10% of patients having a family history of ALS [16]. Known pathogenic mutations (e.g. *C9ORF72* [7,8], TARDBP [17], FUS [18], *NEK1* [19], *SOD1* [20]) have been identified in many familial cases and in 5-7% of non-familial cases [21]; in addition, GWAS have revealed many loci of common genetic variation that confer risk for ALS and FTD. Indeed, recent evidence supports a polygenic contribution to disease risk from common genetic variants [22,23]. These include the largest ALS GWAS to-date which newly identified risk variants in the *KIF5A* gene [12] and genome-wide conjunction and conditional false discovery rate (FDR) analyses demonstrating shared genetic contributions between ALS and FTD from common single nucleotide polymorphisms (SNPs) at known and novel loci [15]. An accumulating body of research suggests that SNPs associated with risk of ALS and FTD demonstrate quantitative trait modification of patient phenotype. For example, a SNP identified as a risk locus for ALS and FTD was found to contribute to cognitive decline, *in vivo* cortical degeneration in the prefrontal and temporal cortices, and *post mortem* pathologic burden of hyperphosphorylated TAR-DNA binding protein [43 kDa] (TDP-43) in the middle frontal, temporal, and motor cortices [24]. Another study found that a SNP identified as a risk locus for FTD with underlying TDP-43 pathology was additionally associated with cognition in patients with ALS [25]. Others have recently demonstrated shared polygenic risk between ALS and other traits (e.g. smoking, education) and diseases (e.g. schizophrenia) [22,23,26], suggesting that a single variant is unlikely to fully account for observed disease phenotype modification. However, there are presently no published studies evaluating polygenic contribution to cognitive dysfunction in ALS. Here we employed an unsupervised machine-learning approach, sparse canonical correlation analysis (sCCA) [27], to identify and evaluate a potential polygenic contribution to cognitive dysfunction in ALS. sCCA has previously been implemented in many contexts such as genetics [28,29], neuroimaging-behavior studies [30,31], and neuroimaging-genetic studies [32-34], including the association of cortical thickness and white matter diffusion to FTD risk SNPs [35]. For the first time, we leverage sCCA as a data-driven tool to facilitate generation of a polygenic risk score. Specifically, sCCA can be leveraged to select variants by employing sparsity to identify maximally contributing variants and to assign corresponding weights based on model contribution with minimal *a priori* assumptions. This contrasts with traditional approaches to constructing polygenic scores that rely on the use of existing GWAS statistics to select variants and assign weights, which can be challenging if the original GWAS statistics are based on case-control associations rather than current neuropsychological outcome of interest. We used sCCA to derive a polygenic risk score for cognitive dysfunction in a large longitudinal cohort of cognitively well-characterized patients with ALS or a related disorder participating in the Phenotype-Genotype-Biomarker (PGB) study of the Clinical Research in ALS and Related Disorders for Therapeutic Development (CReATe) Consortium. We then evaluated independent neuroimaging and autopsy ALS patient cohorts from the University of Pennsylvania Integrated Neurodegenerative Disease Biobank (UPenn Biobank) [36] to evaluate whether polygenic risk for cognitive dysfunction also relates to *in vivo* cortical neurodegeneration and *ex vivo* cortical neuronal loss and TDP-43 pathology. We focused our investigation on SNPs achieving genome-wide significance in the largest published ALS GWAS [12] and SNPs identified as shared risk loci for both ALS and FTD [15]. We hypothesized that a sparse multivariate approach would reveal a subset of genetic loci associated with cognitive dysfunction profiles in ALS in a polygenic manner, and that follow-up analyses in independent neuroimaging and autopsy cohorts would converge to characterize quantitative traits associated with polygenic risk from identified loci. ## Results ### Heterogeneity of baseline cognitive and motor phenotype in ALS patients Smaller-scale studies have shown that ALS patients have impairments in executive function, verbal fluency, and language domains, but with relative sparing of memory and visuospatial function [4]. The Edinburgh Cognitive and Behavioral ALS Screen (ECAS) was developed to measure cognitive function minimally confounded by motor disability and includes an “ALS-Specific” score that captures impairments in language, executive function, and verbal fluency domains that are frequently observed in ALS patients, and an “ALS-Non-Specific” score that captures less frequently observed impairments in memory and visuospatial function, in addition to overall performance (ECAS Total score) [37]. To quantify heterogeneity in cognitive dysfunction, we evaluated 327 patients with ALS, ALS with cognitive impairment (ALSci), or a related disorder (ALS-FTD, primary lateral sclerosis (PLS), progressive muscular atrophy (PMA)) participating in the PGB study of the CReATe Consortium ([NCT02327845](http://medrxiv.org/lookup/external-ref?link_type=CLINTRIALGOV&access_num=NCT02327845&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F16%2F2019.12.23.19014407.atom)) (*Table 1*). We included a spectrum of ALS and related disorder cases in an effort to account for the possibility that a subset of PLS or PMA cases may evolve into ALS [38] and can have similar cognitive profiles of cognitive dysfunction to ALS [39]. We used linear mixed-effects (LME) to model variability between individuals in baseline performance and rate of decline on the ECAS (Total, ALS-Specific, and ALS-Non-Specific scores, and scores for each individual cognitive domain), on the ALS Functional Rating Scale – Revised (ALSFRS-R), and on clinician ratings of upper motor neuron (UMN) and lower motor neuron (LMN) signs (UMN and LMN burden scores); each model included covariate adjustment for potential confounders including age, education, bulbar onset, and disease duration. We confirmed that cognitive and motor performance at baseline are heterogeneous across individuals (*Figure 1A*), and correlation analyses of both baseline and longitudinal rates of change suggest that heterogeneity in cognition is independent of disability in physical function or clinical burden of UMN/LMN signs (all *R*<0.2; *Figure 1B*). Together this establishes the heterogeneity of baseline and longitudinal cognitive and motor phenotypes within the PGB cohort. ![Figure 1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/09/16/2019.12.23.19014407/F1.medium.gif) [Figure 1.](http://medrxiv.org/content/early/2020/09/16/2019.12.23.19014407/F1) Figure 1. Clinical and genetic heterogeneity in the CReATe PGB cohort A) Differences in baseline performance and rate of decline on each clinical measure for each participant; the heatmap indicates each participant’s standard deviation (SD) from the group mean. B) Spearman’s correlations between baseline performance and rate of decline for all clinical measures. C) Allele dosage or binary status for each genetic variable for each participant. View this table: [Table 1:](http://medrxiv.org/content/early/2020/09/16/2019.12.23.19014407/T1) Table 1: Baseline demographic Characteristics of the CReATe PGB cohort. ### Multivariate analyses indicate polygenic contributions to baseline cognitive performance To identify potential polygenic contributions to cognitive impairment in ALS we employed sCCA [27], an unsupervised machine-learning approach enabling identification of multivariate relationships between a dataset of one modality (e.g. genetic variables including allele dosage of SNPs) and another modality (e.g. clinical measures of cognitive and motor function). Traditional CCA identifies a linear combination of all variables that maximize the correlation between datasets, resulting in an association of variables from one dataset (e.g., SNPs) and variables from another dataset (e.g., clinical scores) [27]. The “sparse” component of sCCA additionally incorporates an L1 penalty that shrinks the absolute value of the magnitude of coefficients to yield sparse models (i.e. models with fewer variables) such that some coefficients are zero, and the variables associated with them are effectively eliminated from the model. As a result, variables that contribute little variance to the model are dropped, resulting in the identification of a data-driven subset of variables from one dataset that relate to a data-driven subset of variables from another dataset. Unstandardized regression coefficients resulting from sCCA serve as canonical weights indicating the direction and strength of the relationships between selected variables. We evaluated an allele-dosage dataset comprised of 33 SNPs identified as shared risk loci for both ALS and FTD [15], and 12 SNPs identified as risk loci for ALS from the largest published case-control GWAS [12], with the latter chosen to include loci associated with ALS but not specifically with FTD *(Figure 1C)*. We included the first two principal components from a PCA conducted in the PGB cohort *(Supplementary Figure 1)* and binary variables for sex, *C9ORF72* repeat expansion status, and other mutation status (e.g. *SOD1)* in this dataset in an effort to account for inter-individual genetic differences in population structure, sex, and mutation status. We then used sCCA to examine the association between this genetic dataset and a dataset comprised of adjusted baseline performance on clinical measures of cognitive and motor performance extracted from the LME models. After optimizing model sparsity parameters (*Supplementary Figure* 2), we ran sCCA 10,000 times and employed random bootstrapped subsamples of 75% of participants in each iteration (*Supplementary Figure* 3). We then calculated the median canonical correlation between the clinical and genetic datasets, the median canonical weight for each variable in the genetic dataset, and the proportion of times (as a percentage) each variable from the clinical dataset was chosen out of 10,000 iterations. We report percentages rather than median canonical weight for clinical features because the optimized L1 parameter for the clinical dataset was the most stringent (i.e. 0.1), thus resulting in only one variable from the clinical dataset being chosen in each of the 10,000 iterations. This differs from other regularization techniques (e.g. LASSO), as the variable from the clinical dataset was selected by sCCA modeling in each iteration rather than being experimenter-selected prior to analysis. Importantly, the use of sCCA also minimizes the necessity for multiple comparisons corrections, since all variables can be tested in a single model, and therefore reduces the potential of a Type II false-negative error common in genomics studies related to rejection of a true effect due to overly stringent correction of multiple comparisons. To assess model performance under the null hypothesis (no association between genetic factors and clinical phenotypes), we similarly ran 10,000 bootstrapped sCCAs using the same L1 and subsampling parameters and randomly permuted each dataset 100 times in each model iteration. We examined the proportion of times each variable in the clinical and genetic datasets was selected by this null model (i.e. achieving a non-zero canonical weight). We used the null model to define a *p* value for the true, unpermuted model by calculating the probability under the null hypothesis of observing a canonical correlation greater than or equal to the median canonical correlation under sCCA modeling of the true data. We observed that a subset of 29 genetic variables were correlated with a single clinical variable, achieving a median canonical correlation between the two datasets of R=0.35 (95% Confidence Interval: 0.23, 0.42; *p*=0.019) (*Figure 2, Supplementary Figure* 4). ![Figure 2.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/09/16/2019.12.23.19014407/F2.medium.gif) [Figure 2.](http://medrxiv.org/content/early/2020/09/16/2019.12.23.19014407/F2) Figure 2. Sparse, polygenic relationship between clinical and genetic variation in ALS Variable selection and median canonical weight strength from bootstrap sCCA modeling in the CReATe PGB cohort. See *Supplementary Table 1* for additional detail on genetic variants. Over the 10,000 iterations, the most frequently selected clinical variable was the ECAS ALS-Specific score (percentage of times selected: 37%), followed by the ECAS Total (29%), Executive Function (17%), Language (9.5%), Verbal Fluency (2.3%), ALS-Specific (2.2%), Memory (2%), and Visuospatial (0.34%) scores. The ALSFRS-R and UMN and LMN burden scores were each selected in less than 0.05% of the model iterations. By contrast, performance of sCCA modeling under the null hypothesis demonstrated that each clinical variable was selected in a largely equal percentage of iterations (all variables ranging 5.9% to 9.4%), demonstrating that the true sCCA modeling selected cognitive and not motor features beyond what would be expected by chance (*Supplementary Figure 5A*). Of the 29 selected genetic variables, the 12 most highly weighted were rs1768208 and rs9820623 (*MOBP*), rs7224296 (*NSF*), rs538622 (*ERGIC1*), rs10143310 (*ATXN3*), rs6603044 (*BTBD1*), rs4239633 (*UNC13A*), rs2068667 (*NFASC*), rs10488631 (*TNPO3*), rs11185393 (*AMY1A*), rs3828599 (*GPX3*), and sex. Twenty-seven of the 29 genetic variables selected were SNPs, and 85% of model-selected SNPs (23/27) were shared risk loci for ALS and FTD [15]. Modeling under the null hypothesis revealed that each genetic variable achieved a largely equal median weight, and thus there were no stronger model contributions from any subset of genetic variables (*Supplementary Figure 5B*). The association of genetic variables most frequently with the ECAS ALS-Specific score suggests polygenic contribution to impairment in domains of cognition frequently impaired in patients with ALS (e.g. language, verbal fluency, and executive function), that are also the most impaired domains of cognition observed in FTD. To evaluate whether our observed sCCA model was impacted by inclusion of patients with disorders related to ALS (i.e. PLS, PMA), we compared the median weights for genetic features and the percentage of times selected for clinical features from sCCA modeling using the entire CReATe PGB cohort (i.e. with PLS and PMA included) to those obtained from sCCA modeling using a subset of the CReATe PGB cohort that excluded patients with PLS and PMA. sCCA modeling that excluded patients with PLS and PMA resulted in the most frequent selection of the ECAS Total, ALS-Specific, Executive Function, and Language scores, similar to results obtained in the entire cohort (*Supplementary Figure 6A*). Furthermore, sCCA modeling that excluded patients with PLS and PMA resulted in the same selection of genetic variables as in sCCA modeling of the entire cohort, and achieved similar direction and strength of weights (*Supplementary Figure 6B*). This demonstrates that the inclusion of disorders related to ALS does not potentially confound our observations. ### Polygenic score captures baseline cognition as well as longitudinal rate of cognitive decline, but not motor decline Next we investigated potential polygenic contributions to rate of decline in cognitive and motor performance in the PGB cohort. Investigation of baseline performance may only capture differences at a single (somewhat arbitrary) point in time, but not differences in the trajectory of performance over time. To evaluate association with longitudinal performance, we first calculated a weighted polygenic score (wPRS) by computing a sum of allele dosage for each individual genetic variable multiplied by their median canonical weights from sCCA modeling. We also calculated an unweighted polygenic risk score (uPRS) by computing a sum of allele dosage for each individual genetic variable selected from sCCA modeling. Spearman rank-order correlations between the wPRS and adjusted baseline estimates of the four clinical features selected in 10% or more of the 10,000 iterations (e.g. ALS-Specific, Total, Executive Function, and Language scores from the ECAS) using family-wise error (FWE) correction resulted in correlation values similar to the median canonical correlation observed from sCCA modeling (e.g. for ECAS ALS-Specific: *rs*(*329*)=-0.34, *p*=5.0×10-9) *(Figure 3A)*, suggesting construct validity. We observed no statistically significant relationship between the uPRS and adjusted baseline estimates of performance on ALS-Specific, Executive Function, Language, and ECAS Total scores (all *p* values > .2). ![Figure 3.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/09/16/2019.12.23.19014407/F3.medium.gif) [Figure 3.](http://medrxiv.org/content/early/2020/09/16/2019.12.23.19014407/F3) Figure 3. wPRS correlates with cognitive performance on the ECAS in the CReATe PGB cohort Scatterplots showing that the wPRS correlates with A) adjusted baseline performance on the ECAS ALS-Specific, Total, Executive Function, and Language scores, and B) rate of decline on the ALS-Specific, ALS-Non-Specific, and Total scores. We then conducted Spearman’s rank order correlations between the wPRS and adjusted rate of decline on each clinical measure of cognitive and motor performance using FWE correction. To obtain adjusted rates of decline, we extracted individual slope estimates from prior LME (see above) for the 277 individuals (85%) from the PGB cohort with 2 or more observations on the ECAS, ALSFRS-R, and UMN and LMN burden scores. We observed significant negative relationships between the wPRS and adjusted rate of decline on ECAS ALS-Specific (*rs*(*277*)=-0.21, *p*=5.3×10-3), ALS-Non Specific (*rs*(*277*)=-0.19, *p*=0.016), and Total scores (*rs*(*277*)=-0.26, *p*=8.1×10-5; *Figure 3B*), but not on the ALSFRS-R or UMN and LMN burden scores (all *p* >0.9, *not shown*). We observed no statistically significant relationship between the uPRS and adjusted rate of decline on any clinical measure (all *p* values > .9; *not shown*). These findings suggest polygenic contribution using sCCA-derived weights to the rate of cognitive – but not motor – decline from the SNPs associated with risk of ALS or joint risk of ALS and FTD that were included in this analysis. In *post hoc* analyses, we investigated whether SNPs also contribute individually to rate of decline on clinical measures. We conducted LME modeling of the original longitudinal data to investigate fixed effects of each of the 45 SNPs on each of the 11 clinical measures (i.e. all ECAS scores, ALSFRS-R, and UMN and LMN burden scores), independently. We did not observe any effects that survived corrections for multiple comparisons. However, we observed that the SNPs achieving the five largest median weights from bootstrapped sCCA modeling (rs1768208, rs538622, rs10143310, rs7224296, rs9820623) also independently related to performance on the ECAS ALS-Specific and Total scores (all uncorrected *p* <.05). We also conducted *post hoc* analyses to investigate whether the inclusion of SNPs in high linkage disequilibrium (LD) influence the magnitude and direction of the wPRS we re-ran bootstrapped sCCA analyses using 10,000 iterations excluding the 5 SNPs in high LD (i.e. based on the cutoff of R2 > 0.5) and recalculated the wPRS in the PGB cohort. This revealed a strong linear relationship between both wPRS models (Pearson’s *R* = 0.90 (95% CI: 0.87, 0.91), *p* < 2.2×10-16; *Supplementary Figure 7*) and thus LD of a subset of SNPs is unlikely to be a driver of our observed polygenic associations. ### Polygenic score associates with cortical thinning in the UPenn Biobank Cognitive dysfunction in ALS, including performance on the ECAS, has previously been attributed to sequential disease progression rostrally and caudally from the motor cortex [40-42] and to advancing disease stage [4]. To evaluate the neuroanatomic basis for polygenic contribution to cognitive performance in patients with ALS, we applied the wPRS score derived in the CReATe PGB cohort to an independent cohort of patients with ALS from the UPenn Biobank. We used voxel-wise *in vivo* measures of reduced cortical thickness (in mm3) to quantify cortical neurodegeneration. Cross-sectional measurements of cortical thickness were derived from T1-weighted magnetic resonance imaging (MRI) in 90 patients with ALS and 90 age, sex, and education-matched healthy controls who were recruited for research from UPenn (*Table* 2). Nonparametric modeling using 10,000 random permutations revealed extensive reduction of cortical thickness bilaterally in the frontal and temporal cortices of patients relative to controls (threshold-free cluster enhancement, FWE corrected *p*<0.05) (*Supplementary Table 3, Supplementary Figure* 9). After identifying regions of reduced cortical thickness in patients with ALS, we investigated whether the wPRS derived from sCCA modeling in the CReATe PGB cohort contributed to the magnitude of reduced cortical thickness in the independent UPenn Biobank neuroimaging cohort. Nonparametric modeling using 10,000 random permutations with adjustments for potential confounds in age, disease duration, and scanning acquisition revealed that a higher wPRS (i.e. greater risk) associated with greater reduction of cortical thickness in regions including the orbital prefrontal cortex, anterior cingulate cortex, premotor cortex, lateral temporal cortex, and hippocampus that survived uncorrected *p* value of 0.01 and a cluster extent threshold of 10 voxels (*Figure 4A; Supplementary Table* 3). The frontal and temporal lobe cortical regions identified in this analysis are known to support the domains of cognitive dysfunction characterized by the ECAS [40]. We observed no statistically significant relationship between the uPRS and cortical thickness in any region (not shown). These findings provide a potential neuroanatomical basis for the observed polygenic relationships between the wPRS and baseline cognitive performance and rate of decline, and are consistent with prior associations of cortical neurodegeneration with cognitive dysfunction in patients with ALS [41]. ![Figure 4.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/09/16/2019.12.23.19014407/F4.medium.gif) [Figure 4.](http://medrxiv.org/content/early/2020/09/16/2019.12.23.19014407/F4) Figure 4. Reduced cortical thickness and greater cortical neuronal loss relates to higher wPRS in independent validation cohorts A) ALS patients from the UPenn Biobank neuroimaging cohort with higher wPRS exhibited greater reduction of cortical thickness in the orbital prefrontal cortex, anterior cingulate cortex, premotor cortex, lateral temporal cortex, and hippocampus. The heatmap indicates the associated T-statistic for each voxel, with light blue representing the highest value. B) Magnitude of motor cortex neuronal loss in ALS cases from the UPenn Biobank is associated with higher wPRS. ### Polygenic score associates with neocortical neuronal loss in the UPenn Biobank To complement these *in vivo* neuroanatomical data, we also explored whether polygenic risk for cognitive dysfunction associated with *post-mortem* anatomical distribution of neuronal loss and TDP-43 pathology. We assessed the magnitude of neuronal loss and TDP-43 pathological inclusions on an ordinal scale in tissue sampled from the middle frontal, cingulate, motor, and superior / middle temporal cortices and from the cornu ammonis 1 (CA1) / subiculum of the hippocampus in 87 autopsy cases from the UPenn Biobank with confirmed ALS due to underlying TDP-43 pathology (*Table 2; Supplementary Table 4*). We conducted ordinal logistic regression with covariate adjustment for age at death and disease duration and found that as patients’ wPRS increases, their odds of greater neuronal loss in the motor cortex also increases (OR = 1.98; 95% CI: 1.01, 3.96; uncorrected *p*=0.049; *Figure 4B*); older age at death and longer disease duration were not found to statistically significantly influence these odds (uncorrected *p*>0.05). We observed no statistically significant associations between the wPRS and neuronal loss in any other region, or between the wPRS and TDP-43 pathology in any region (all *p* values>0.1; *Supplementary Figures 10 and 11*). We also observed no statistically significant associations between the uPRS and neuronal loss or TDP-43 pathology in region (all *p* values>0.19, uncorrected; not shown). These findings suggest that polygenic risk for cognitive dysfunction is associated with the neuroanatomic distribution of neuronal loss in ALS cases at end-stage disease. View this table: [Table 2.](http://medrxiv.org/content/early/2020/09/16/2019.12.23.19014407/T2) Table 2. Demographics for independent neuroimaging (A) and autopsy (B) amyotrophic lateral sclerosis (ALS) and healthy control cohorts from UPenn Biobank. ## Discussion In this study, we evaluated polygenic contributions to cognitive dysfunction in patients with ALS by employing machine learning. We identified polygenic risk for cognitive dysfunction from genetic variables associated with risk of ALS and FTD, which we further investigated through quantitative-trait evaluations of two independent ALS cohorts with *in vivo* neuroimaging and *post-mortem* neuropathology data. Our results indicate a polygenic contribution to the presence and rate of decline of cognitive dysfunction in domains specifically impaired in ALS. Converging evidence from these independent cohorts further demonstrates the generalizability of polygenic contribution to biologically-plausible associations including reduced *in vivo* cortical thickness and *post-mortem* cortical neurodegeneration in the prefrontal, motor, and temporal cortices. These findings contribute novel evidence in support of the polygenic contribution to cognitive dysfunction and cortical disease burden in ALS and provide further detailed phenotypic evidence for genetic overlap between ALS and FTD. Below, we highlight clinical, biological, and methodological implications for our observations. Our findings add to an increasing body of evidence for a genetic contribution to phenotypic variability in ALS and support the idea that polygenic variation accounts for a portion of variability in cognitive dysfunction and cortical disease burden in ALS. While cognitive dysfunction has been more frequently linked to genetic mutations causally associated with ALS, such as *C9ORF72* repeat expansions [43], studies examining individual SNPs have demonstrated quantitative-trait modification of cognitive performance and cortical disease burden [24,25]. However, mounting evidence suggests that there are polygenic, rather than single allele, modifiers of disease risk and phenotype in ALS and related neurodegenerative diseases [22,23,26]. Our observation of polygenic association between of 27 SNPs and the ECAS ALS-Specific score, a combined measure of executive, language, and verbal fluency domains most commonly affected in ALS, is consistent with the idea of polygenic contribution to phenotypic variability in ALS. Notably, our observed polygenic association in the CReATe PGB cohort appears specific to cognitive variability: we demonstrate relative independence of cognitive performance and motor disease severity (i.e. UMN or LMN burden scores, functional performance on the ALSFRS-R) and observe no evidence for polygenic association with motor disease severity. This suggests that, in this study, polygenic risk for cognitive dysfunction does not appear to be confounded by motor disease severity. The majority (85%) of the 27 SNPs selected by our machine learning modeling for association with cognitive dysfunction are shared risk loci for ALS and FTD [15]. The selection frequency of these ALS and FTD risk variants outweighed the selection of ALS-only risk variants, emphasizing the contribution of genetic overlap between ALS and FTD to polygenic risk associated with cognitive dysfunction in ALS. SNPs in or near the *MOBP, NSF, ATXN3, ERGIC1*, and *UNC13A* genes were among those with the strongest model contributions (i.e. with the largest canonical weights). Our group has previously shown that SNPs mapped to *MOBP*, including rs1768208, relate to regional neurodegeneration in sporadic FTD and to shorter survival in FTD with underlying tau or TDP-43 pathology [35,44]. Our group has also demonstrated that rs12608932 in *UNC13A* relates to *in vivo* prefrontal cortical thinning, *post mortem* frontal cortical burden of TDP-43 pathology, and executive dysfunction [24]. rs538622 near *ERGIC1*, originally identified as a shared risk locus for ALS and FTD, has also previously been demonstrated to contribute to quantitative trait modification in ALS by relating to reduced expression of the protein BNIP1 in ALS patient motor neurons [15]. Other top-weighted variants near *NSF* and *ATXN3* indicate potential biological plausibility: rs10143310 is found near *ATXN3* which encodes a de-ubiquitinating enzyme, and polyglutamine expansions in *ATXN3* cause spinocerebellar ataxia – type 3 [45]; rs7224296 near *NSF* tags the *MAPT* H1 haplotype [46] and is associated with increased risk for FTD syndromes including progressive supranuclear palsy and corticobasal degeneration [47], as well as Alzheimer’s and Parkinson’s diseases [48]. While the mechanism of polygenic contribution to cognitive dysfunction in ALS requires further investigation, we speculate based on our findings that identified SNPs may contribute to neuroanatomic disease burden. The wPRS derived from the observed multivariate genotype-phenotype correlation in the CReATe PGB cohort showed robust relationships in independent cohorts from the UPenn Biobank to both *in vivo* cortical thinning and *post-mortem* cortical neuronal loss. Higher polygenic risk related to *in vivo* cortical thinning in the orbital prefrontal cortex, anterior cingulate cortex, premotor cortex, lateral temporal cortex, and hippocampus in a neuroimaging cohort, and to *post-mortem* neuronal loss in sampled tissue from the motor cortex in an autopsy cohort. We speculate that the relationship to motor cortex only in the neuropathology cohort may reflect two sources of sampling differences. First, clinical characteristics differed across cohorts: 9% of the autopsy cohort had premorbid diagnoses of ALS-FTD or ALSci and 29% of the neuroimaging cohort were diagnosed with ALS-FTD or ALSci. Thus, the autopsy cohort likely had less frontal and temporal cortex neuronal loss relative to motor cortex neuronal loss. Second, the differences across analyses may reflect different scales of resolution in which neuroimaging data is analyzed at 2mm3 resolution across the entire cortex while neuropathological data is sampled at approximately 6μm. We are aware of these issues and more recently have begun to increase tissue sampling including bilateral hemisphere [49,50], more extensive brain regions [51], performing digital immunohistochemistry analyses [50,52], and whole hemisphere post-mortem neuroimaging using 7T MRI. Thus, future studies will be able to address these sampling differences as our autopsy cohort continues to grow and our technical methods continue to improve. Anatomically, these findings are largely consistent with prior *in vivo* structural imaging studies of neurodegeneration associated with cognitive dysfunction and with *post mortem* investigations of cortical thinning in ALS [40,41,53]. Thus, in addition to indicating polygenic contribution to cognitive dysfunction in ALS, our findings suggest a possible mechanism of observed findings via disease pathophysiology. Beyond the potential biological mechanism of identifying polygenic contributions to ALS disease heterogeneity, we additionally suggest that sCCA may provide a tool for defining polygenic factors of disease risk. While sCCA has been widely applied to genotype-phenotype studies [28,29], including neuroimaging-genetic studies [30-35], we are unaware of prior applications using sCCA to define a polygenic score based on rich clinical phenotypic and biomarker data. Traditional approaches to the generation of polygenic scores include using data from established, typically case-control GWAS, but practical considerations involve the selection of how many variants to include in a model and how to define the weights of an appropriate statistical model [54]. Critically, rather than an arbitrary selection of variants and their weights, the sparsity parameter of sCCA facilitates an unsupervised, data-driven method to select the number of variants to include and also provides data-driven canonical weights to define the statistical model. The positive or negative direction of model-derived weights is potentially biologically informative, and could reflect ‘risk’ (i.e. positive weight) or ‘protective’ (i.e. a negative weight) effects. We evaluated the wPRS using model-derived weights relative to a uPRS derived created by computing an unweighted sum of allele dosages for each genetic variable. Our observation that the uPRS did not relate to cognitive or clinical performance in the CReATe PGB cohort or to neuroimaging or neuropathology in the UPenn Biomarker cohorts suggests that that the weights derived from sCCA meaningfully define the relationship between genetic variation and quantitative phenotypic differences in the CReATe PGB and UPenn cohorts with regards to cognitive performance and disease neuroanatomy. Further investigation is needed to clarify the relationships between model-selected SNPs and model-derived canonical weights from both biological (e.g., some SNPs and/or genes may contribute more strongly to risk factors) and mathematical (e.g. weights may be constrained by minor allele frequency) perspectives. While our sCCA modeling selected 27 SNPs in addition to sex and *C9ORF72* mutation status and we used model-derived weights to calculate a wPRS, we are unable to determine in the current study what the collective contribution of these SNPs are to modifying cognitive phenotypes. For example, these could be additive in nature, such that increased risk allele dosage increases risk for impaired cognition, or the selected SNPs could act independently in disease modification. *Post hoc* investigation of independent SNP effects on longitudinal cognitive performance revealed that the SNPs achieving the five largest median weights from bootstrapped sCCA modeling also relate to longitudinal cognitive performance; however, these effects did not survive correction for multiple comparisons. By its nature, this *post hoc* investigation considered each SNP as independent from other SNPs and each clinical measure as independent from other clinical measures and thus did not account for more complex collective contribution of SNPs to cognitive phenotypes. As often is the case, future functional studies are required to identify the mechanistic relationship between SNP associations and cognitive phenotype. Nonetheless, our results support the consideration of sCCA as a promising method to identify collective combinations of SNPs and cognitive phenotypes and to direct research efforts towards model-selected variants. Several limitations should be considered in the present study. Here, we focus our analysis on a relatively small set of SNPs selected *a priori* from previous large-scale GWAS based on genome-wide association with ALS [12] or shared risk between ALS and FTD [15]. Other genetic variants not included in the present study may also contribute to cognitive dysfunction in ALS and related disorders, and future genome-wide analyses or broad genotype selection strategies (e.g., targeted pathways) are necessary to elucidate discovery of novel genetic contributions to cognition that have not been identified through prior case-control studies. While we focus on ALS and FTD risk variants and demonstrate that the inclusion of related disorders (i.e. PLS, PMA) does not confound our observed cognitive and genetic associations, future work should also incorporate variants associated with risk for disorders related to ALS and specifically test the application of polygenic associations within PLS and PMA. However, such larger scale studies will require validation in independent cohorts, many of which are lacking the rich phenotype data needed to identify cognitive dysfunction. We derived a wPRS from sCCA modeling to further investigate polygenic associations with longitudinal cognitive and motor performance, and with *in vivo* and *post-mortem* cortical disease burden in independent ALS cohorts from the UPenn Biobank. While we define our polygenic score from sCCA using adjusted estimates of baseline cognitive and motor performance, future work using longitudinal data as the starting point to define polygenic associations may further elucidate genetic risk for cognitive dysfunction in ALS. However, our finding that polygenic risk associated with baseline cognitive dysfunction also relates to longitudinal cognitive decline in the CReATe PGB cohort as well as to relevant cortical disease anatomy in independent cohorts from the UPenn Biobank suggests its relevance to longitudinal cognitive phenotypes in ALS. Previous critique of polygenic scores argue that three factors limit their use in clinical and prognostic settings: (1) calculation based on GWAS-defined odds ratios for univariate risk loci; (2) undue influence by population variance; and (3) predominant use of samples of European ancestry [55] [56]. In an attempt to mitigate these potential confounds, we based our computation of a wPRS on model-selected parameters derived from an analysis including all genetic variants and, in addition, covariates for genetic mutation status and sex in an effort to account for multivariate genetic relationships. We also included the first two principal components in our model from a PCA conducted in the CReATe PGB cohort in an effort to account for differences in population substructure [57]. While we used the first two principal components in an effort to account for population substructure, this is a complex issue to resolve and future studies with more diverse cohorts to investigate potential substructure bias are necessary. The current investigation utilized existing data from natural history studies that were predominantly comprised of individuals of European ancestry; however, increased representation of diverse racial and ethnic groups in future investigations of polygenic risk for cognitive impairment in ALS are necessary in order to ensure generalizability to diverse populations. Our analyses focused on the investigation of genetic contribution to cognitive dysfunction in ALS, yet it is well established that behavioral impairment is also part of the ALS spectrum disease [58]. We assessed patient performance on specific domains of cognition using the ECAS, which includes a measure of social cognition counted towards the domain of executive function. Behavioral impairment on the ECAS is assessed through caregiver report [37], and the vast majority of neuropsychological assessments of behavior in neurodegenerative disease are based on physician or caregiver report [59]. With this in mind, we chose to focus our investigation on the analysis of patient-completed assessments of cognition and motor function. Future research incorporating assessments of behavior is necessary to investigate polygenic risk for behavioral dysfunction in ALS and related disorders and to determine whether loci included in our calculated polygenic risk score additionally confer risk for behavioral dysfunction. Although the current study demonstrates converging, multimodal evidence for polygenic risk, replication in additional cohorts with larger sample sizes that allow for robust cross-validation is warranted. Notably, machine-learning methods have the tendency to over-fit data and produce estimates that do not generalize to different data sets. However, alternative datasets for ALS that contain detailed genotyping and cognitive phenotyping are currently lacking and the CReATe PGB cohort represents the largest of its kind. In the absence of an alternative dataset to minimize over-fitting, we employed a bootstrapping procedure and generated a final sCCA model based on median weights across permutations rather than selecting a single “top model”. We additionally demonstrate converging, multimodal evidence for polygenic risk in independent neuroimaging and neuropathology biomarker cohorts in an effort to provide corroboration that we are detecting a true biological signal. However, future research is necessary to determine the predictive potential and generalizability of our proposed polygenic risk score in ALS patients. We furthermore hope that this demonstration motivates the collection of additional genotyping data and longitudinal cognitive evaluation using the ECAS in additional large-scale patient cohorts. With these limitations in mind, our research demonstrates converging clinical, neuroimaging, and pathologic evidence for polygenic contribution to cognitive dysfunction and cortical neurodegeneration in ALS. These findings should stimulate further investigation into polygenic risk for cognitive disease vulnerability in ALS and suggest their importance in prognostic consideration and treatment trials. More broadly, this work provides insight into genetic contribution to heterogeneous phenotypes in neurodegenerative disease and supports evidence for polygenic architecture in these conditions. ## Materials and Methods ### Participants: CReA Te consortium Participants consisted of 339 individuals clinically diagnosed by a board-certified neurologist with a sporadic or familial form of amyotrophic lateral sclerosis (ALS), amyotrophic lateral sclerosis with frontotemporal dementia (ALS-FTD), progressive muscular atrophy (PMA), or primary lateral sclerosis (PLS) who were enrolled and evaluated through the CReATe Consortium’s Phenotype-Genotype-Biomarker (PGB) study. All participants provided written informed consent. The PGB study is registered on clinicaltrials.gov ([NCT02327845](http://medrxiv.org/lookup/external-ref?link_type=CLINTRIALGOV&access_num=NCT02327845&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F16%2F2019.12.23.19014407.atom)) and the University of Miami Institutional Review Board (IRB) (the central IRB for the CReATe Consortium) approved the study. This study entails participant blood DNA samples available for genetic screening and longitudinal evaluation at regularly-scheduled visits (ALS, ALS-FTD, and PMA: 0 (baseline), 3, 6, 12, and 18 months; PLS: 0 (baseline), 6, 12, 18, and 24 months). A subset of 155 CReATe PGB cases were previously included in the replication cohort of the ALS case-control GWAS [12]. Participants were evaluated at each visit using the ALSFRS-R [60] and alternate versions of the Edinburgh Cognitive and Behavioural ALS Screen (ECAS) [37] designed for longitudinal use. Presence of ALS with cognitive impairment (ALSci) was assessed at baseline using the ECAS according to established criteria [61], operationalized as baseline performance on Executive Function, Verbal Fluency, or Language subscores at or below normative cutoff scores [37]. UMN and LMN burden scores were calculated from a detailed elemental neuromuscular examination by summing within and across each spinal region resulting in a score ranging from 0 (none) to 10 (worst). Site (e.g. limb, bulbar) and date of motor symptom onset were recorded for each participant. We excluded nine individuals with missing or incomplete data that precluded subsequent analysis and, in an effort to avoid confounds associated with clear outliers, three individuals with extreme values at baseline on the ECAS Visuospatial Score (i.e. >5 standard deviations from group mean), resulting in a total of 327 participants. Of the nine excluded individuals with missing or incomplete data, one had no genotyping data available, one had no information for UMN burden score, and seven had no information for date of motor symptom onset. ### Genotyping: CReATe consortium Peripheral blood mononuclear cell DNA was extracted using the QIAamp DNA Blood Mini Kit Qiagen #51106 and quantified using the Quant-iT dsDNA Assay Kit (Life Technologies cat#Q33130). The DNA integrity was verified by agarose gel electrophoresis (E-Gel, Life Technologies, cat#G8008-01). Unique samples were barcoded and whole genome sequencing (WGS) was performed at the HudsonAlpha Institute for Biotechnology Genomic Services Laboratory (Huntsville, Alabama) (HA) using Illumina HiSeq X10 sequencers to generate approximately 360 million paired-end reads, each 150 base pairs (bp) in length. Peripheral DNA was extracted from participant blood samples and screened for known pathogenic mutations associated with ALS and related diseases. Screening included repeat-primed polymerase chain reaction (PCR) for *C9ORF72* repeat expansions and WGS curated and validated via Sanger sequencing for pathogenic mutations associated with ALS and/or FTD in *ANG, CHCHD10, CHMP2B, FUS, GRN, hnRNPA1, hnRNPA2B1, MAPT, MATR3, OPTN, PFN1, SETX, SOD1, SPG11, SQSTM1, TARDBP, TBK1, TUBA4A, UBQLN2, VCP* (see *Table 1* for participant mutation status). The PGB study also includes patients with hereditary spastic paraplegia (HSP) that were excluded in the current analysis, but we additionally screened individuals for pathogenic mutations in 67 additional genes associated with HSP and 7 genes associated with distal hereditary motor neuropathy, and all cases were negative for pathogenic mutations in these genes. Whole genome sequencing (WGS) data were generated using paired-end 150 bp reads aligned to the GRCh38 human reference using the Burrows-Wheeler Aligner (BWA-ALN v0.7.12) [62] and processed using the Genome Analysis Toolkit (GATK) best-practices workflow implemented in GATK v3.4.0 [63]. Variants for individual samples were called with HaplotypeCaller, producing individual variant call format files (gVCFs) that we combined using a joint genotyping step to produce a multi-sample VCF (pVCF). Variant filtration was performed using Variant Quality Score Recalibration (VQSR), which assigns a score to each variant and a pass/fail label and evaluated this in the context of hard filtering thresholds (Minimum Genotype Quality (GQ)≥ 20, minimum mean depth value (DP) ≥ 10). Variant annotation was performed using Variant Effect Predictor (VEP) [64] and in-house pipelines including non-coding variant allele frequencies from Genome Aggregation Database (gnomAD) [65]. In-house scripts were used to identify false positives resulting from paralogous mapping or/and gaps in the current human genome assembly. VCFs were further decomposed prior to analyses using the Decompose function of Vt [66]. In an attempt to account for population substructure, we additionally derived the first two principal components scores for each in the CReATe PGB cohort using principal components analysis (PCA) implemented using Eigenstrat [57]. From the WGS data we extracted 45 hypothesized variants from WGS that previously achieved genome-wide significance for association with ALS [12] or joint association with ALS and FTD [15]. Proxy loci were genotyped (LD R2 > 0.80) when genetic data were not available for previously-published loci (see *Supplementary Table 1* for a complete list). One locus, rs12973192, was common to both references, and another locus (rs2425220 [15]) was excluded from analysis due to high level of missingness across samples; no LD proxy was identified. We then used PLINK software [67] to recode participant genotypes according to additive genetic models (e.g. 0 = no minor allele copies, 1 = one minor allele copy, 2 = two minor allele copies), since the dominant or recessive nature of the loci included in this study remains unknown. An assessment of LD revealed that 5 of our 45 hypothesized SNPs were in high LD with one another (D’>0.8; *Supplementary Table* 2), but we included these high LD SNPs in our investigation since sCCA is able to accommodate highly correlated features [27]. ### Linear mixed-effects modeling of the ECAS and clinical measures We conducted linear mixed-effects modeling of performance on the ECAS, ALSFRS-R, and UMN and LMN burden scores using the *nlme* package in R. Each model was fit using maximum likelihood. In addition to the ECAS Total Score, we analyzed Executive Function, Language, Verbal Fluency, Memory, and Visuospatial sub-scores and ALS-Specific and ALS-Non-Specific summary scores each as dependent variables to analyze patient performance in separate cognitive domains and in clinically-grouped cognitive domains. Fixed effects included age at baseline visit (in years), lag between age of symptom onset and age at baseline visit (in years), college education (yes / no), bulbar onset (yes / no) and visit time-point (in months), and we included individual-by-visit time-point as a random effect. This allowed us to obtain adjusted estimates of baseline performance (i.e. intercept) and rate of decline (i.e. slope) per individual, having regressed out potential confounding variables as fixed effects. We conducted Spearman’s rank-order correlations between baseline performance and rate of decline using FWE correction for multiple comparisons (see *Figure 1B*). In addition to the linear mixed-effects models described above, we also conducted a second series of linear-mixed effects models to investigate fixed effects of each of the 45 SNPs on each of the 11 clinical measures (i.e. all ECAS scores, ALSFRS-R, and UMN and LMN burden scores), independently; this resulted in a total of 495 models. We again used the *nlme* package in R and each model was fit using maximum likelihood. In addition to each SNP, we included age at baseline visit (in years), lag between age of symptom onset and age at baseline visit (in years), college education (yes / no), bulbar onset (yes / no) and visit time-point (in months) as fixed effects, and we included individual-by-visit time-point as a random effect. ### Sparse Canonical Correlation Analysis We conducted sparse canonical correlation analysis (sCCA) to select a parsimonious linear combination of variables that maximize the correlation between two multivariate datasets using the *PMA* package in R [27]. The first dataset comprised scaled intercepts from each clinical variable per participant (i.e. adjusted baseline performance on the ALSFRS-R, UMN and LMN assessments, and ECAS). The second comprised minor allele counts per individual for each of the 45 SNPs (e.g. 0 = no minor allele copies, 1 = one minor allele copy, 2 = two minor allele copies), binary variables for sex (0 = Female, 1 = Male), *C9ORF72* repeat expansion status (0 = noncarrier, 1 = carrier), and other mutation status (0 = noncarrier, 1 = carrier) and, in an effort to account for potential population differences in population substructure, we also included the raw estimates for the first two principle components per participant derived from a PCA conducted in the CReATe PGB cohort; this method has previously been demonstrated to account for the majority of population structure [57]. We assumed standard (e.g. unordered) organization of each dataset, and selected regularization parameters for the sCCA analysis using a grid search of 100 combinations of L1 values between 0 (most sparse) and 1 (least sparse) in increments of 0.1. We selected the combination of L1 values yielding the largest canonical correlation of the first variate for subsequent analysis, as similarly reported [68]. Using these L1 parameters, we ran 10,000 bootstrap sCCAs and in each iteration employed randomly-generated subsamples comprising 75% of the PGB cohort. We calculated the median canonical correlation for sCCA and the median canonical weights for each variable across all iterations. We utilized the median in these estimates rather than the maximum or mean value in an effort to avoid bias from outliers and to increase the reliability and reproducibility of model estimates. We next investigated model performance under a null hypothesis (i.e. no association between clinical and genetic datasets) by using randomly-permuted data. Using the same L1 parameters, we again ran 10,000 bootstrap sCCAs and in each iteration employed randomly-generated subsamples of 75% of participants; however, in each iteration we randomly permuted each dataset 100 times using the *randomizeMatrix* function from the *picante* package in R. We calculated a *p* value by reporting the probability under the null of observing a canonical correlation greater than or equal to the median canonical correlation under sCCA modeling of the true data. We also examined the proportion of iterations each variable was selected by the model (i.e. achieving a non-zero canonical weight). ### Polygenic Risk Score We used the output of sCCA modeling to calculate a wPRS for each individual. A wPRS for each individual in the PGB cohort, and in the neuroimaging and autopsy UPenn Biobank cohorts, was constructed by multiplying allele dosage or binary coding at each genetic variable by its median canonical weight from sCCA modeling, and summing across all values. To investigate construct validity, we first conducted Spearman’s rank-order correlations between the wPRS and adjusted estimates of baseline performance (i.e. LME-derived intercepts) on the most frequently selected clinical measure(s) selected from sCCA. Then, to investigate longitudinal performance associated with the wPRS, we conducted Spearman’s rank-order correlations between the wPRS and adjusted rates of decline (i.e. LME-derived slopes) on all clinical measures using FWE correction. We restricted this analysis to participants in the CReATe PGB cohort with data at 2 or more timepoints (N=277 out of 327 participants), or 84.7% of the cohort. ### Participants: UPenn Biobank neuroimaging cohort We retrospectively evaluated 90 patients with ALS and 90 healthy controls matched for age, sex, and education from the UPenn Biobank who were recruited for research between 2006 and 2019 from the Penn Comprehensive ALS Clinic and Penn Frontotemporal Degeneration Center (*Table 2*) [36]. Inclusion criteria for ALS patients consisted of the following: lack of participation in the CReATe PGB cohort, complete genotyping at the 45 analyzed SNPs, screening for genetic mutations (e.g. *C9ORF72, SOD1*), white non-Latino racial and ethnic background (population diversity is known to influence allele frequencies across individuals), disease duration from symptom onset < 2.5 standard deviations from respective group means (to avoid confounds associated with clear outliers), and T1-weighted MRI. All patients were diagnosed with ALS by a board-certified neurologist (L.E., L.M., M.G., D.I.) using revised El Escorial criteria [69] and assessed for ALS frontotemporal spectrum disorder using established criteria [61]; those patients enrolled in research prior to 2017 were retrospectively evaluated through chart review. All ALS patients and controls participated in an informed consent procedure approved by an IRB convened at UPenn. ### Participants: UPenn Biobank autopsy cohort We evaluated brain tissue samples from 87 ALS autopsy cases identified from the UPenn Biobank [36] who were diagnosed by a board-certified neuropathologist (J.Q.T., E.B.L.) with ALS due to TDP-43 pathology using immunohistochemistry [70] and published criteria [71]; this cohort included 20 patients from the ALS neuroimaging cohort. During life, all patients were diagnosed with ALS by a board-certified neurologist (L.E., L.M., M.G., D.I.) using revised El Escorial criteria [69] and assessed for ALS frontotemporal spectrum disorder using established criteria [61]; those patients enrolled in research prior to 2017 were retrospectively evaluated through chart review. Inclusion criteria consisted of the following: lack of participation in the CReATe PGB cohort, complete genotyping at the 45 analyzed SNPs, screening for genetic mutations (e.g. *C9ORF72, SOD1*), white non-Latino racial and ethnic background (population diversity is known to influence allele frequencies across individuals), disease duration from symptom onset < 2.5 standard deviations from respective group means (to avoid confounds associated with clear outliers), and brain tissue samples from the middle frontal, motor, cingulate, and superior / temporal cortices, and the cornu ammonis 1 (CA1) / subiculum of the hippocampus for analysis of neuronal loss and TDP-43 pathology. Nine individuals were missing neuronal loss or TDP-43 pathology data for at least one sampled region (*Supplementary Table* 3). ### Genetic Screening and SNP Genotyping: UPenn Biobank DNA was extracted from peripheral blood or frozen brain tissue following the manufacturer’s protocols (Flexigene (Qiagen) or QuickGene DNA whole blood kit (Autogen) for blood, and QIAsymphony DNA Mini Kit (Qiagen) for brain tissue). All patients were screened for *C9ORF72* hexanucleotide repeat expansions using a modified repeat-primed PCR as previously described [72]. Of the remaining individuals, we evaluated family history using a three-generation pedigree history, as previously reported [73]. For cases with a family history of the same disease, we sequenced 45 genes previously associated with neurodegenerative disease, including genes known to be associated with ALS (e.g. *SOD1* [20], TBK1 [10]). Sequencing was performed using a custom-targeted next-generation sequencing panel (MiND-Seq) [36] and analyzed using Mutation Surveyor software (Soft Genetics, State College, PA). DNA extracted from peripheral blood or cerebellar tissue samples was genotyped for each case using the Illumina Infinium Global Screening Array through the Children’s Hospital of Philadelphia (CHOP) Center for Applied Genomics Core according to manufacturer’s specifications. PLINK [67] was then used to remove variants with <95% call rate, Hardy-Weinberg equilibrium (HWE) *p*-value<10-6 and individuals with >5% missing genotypes. Using the remaining genotypes from samples passing quality control, we performed genome-wide imputation of allele dosages with the Haplotype Reference Consortium reference panel r1.1 [74] on the Michigan Imputation Server [75] to predict genotypes at ungenotyped genomic positions, applying strict pre-phasing, pre-imputation filtering, and variant position and strand alignment control. ### Neuroimaging Processing and Analyses High-resolution T1-weighted MPRAGE structural scans were acquired for neuroimaging participants using a 3T Siemens Tim Trio scanner with an 8-channel head coil, with T=1620ms, T=3.09ms, flip angle=15°, 192×256 matrix, and 1mm3 voxels. T1-weighted MRI images were then preprocessed using Advanced Normalization Tools (ANTs) software [76]. Each individual dataset was deformed into a standard local template space in a canonical stereotactic coordinate system. ANTs provide a highly accurate registration routine using symmetric and topology-preserving diffeomorphic deformations to minimize bias toward the reference space and to capture the deformation necessary to aggregate images in a common space. Then, we used N4 bias correction to minimize heterogeneity [77] and the ANTs Atropos tool to segment images into six tissue classes (cortex, white matter, cerebrospinal fluid, subcortical grey structures, brainstem, and cerebellum) using template-based priors and to generate probability maps of each tissue. Voxel-wise cortical thickness was measured in millimeters (mm3) from the pial surface and then transformed into Montreal Neurological Institute (MNI) space, smoothed using a three sigma full-width half-maximum Gaussian kernel, and downsampled to 2mm isotropic voxels. We used *randomise* software from FSL to perform nonparametric, permutation-based statistical analyses of cortical thickness images from the UPenn Biobank neuroimaging cohort. Permutation-based statistical testing is robust to concerns regarding multiple comparisons since, rather than a traditional assessment of two sample distributions, this method assesses a true assignment of factors (e.g. wPRS) to cortical thickness compared to many (e.g., 10,000) random assignments [78]. First, we used *randomise* set to 10,000 permutations to identify reduced cortical thickness in ALS patients relative to healthy controls. We constrained this analysis using an explicit mask restricted to high probability cortex (>0.4) and reported clusters that survive *p*<0.05 threshold-free cluster enhancement [79] corrected for FWE. Next, we again used *randomise* set to 10,000 permutations to identify regions of reduced cortical thickness associated with wPRS in ALS patients, constraining analysis to an explicit mask defined by regions of reduced cortical thickness in ALS patients relative to controls (see above). The statistical model for this analysis included covariate adjustment for age, disease duration, and scanner acquisition. We report clusters that survive uncorrected *p*<0.01 with a cluster extent threshold of 10 voxels; we employ an uncorrected threshold to minimize the chance of Type II error (not observing a true result). ### Neuropathology Processing and Analyses The extent of neuronal loss and of phosphorylated TDP-43 intraneuronal inclusions (dots, wisps, skeins) in sampled regions from the middle frontal, cingulate, motor, and superior / middle temporal cortices, and the CA1 / subiculum of the hippocampus were assessed on an ordinal scale: 0=none/rare, 1=mild, 2=moderate, 3=severe/numerous. All neuropathological ratings were performed by an expert neuropathologist (J.Q.T., E.B.L.) blinded to patient genotype. We conducted ordinal logistic regression using the *MASS* package in *R* to investigate whether extent of neuronal loss rated using Hematoxylin and eosin (H&E) and burden of TDP-43 pathology rated using mAbs p409/410 or 171 [80,81] immunohistochemistry differed according to wPRS, with covariate adjustment for age and disease duration at death. ## Data & Code Availability All R software code generated to perform the reported analyses has been deposited online ([https://github.com/pennbindlab/PolygenicALSCognitive](https://github.com/pennbindlab/PolygenicALSCognitive)). Please review the associated README file for details of data access. Briefly, associated datasets can be obtained as follows: The Clinical Research in ALS and Related Disorders for Therapeutic Development (CReATe) Consortium Phenotype-Genotype Biomarker (PGB) Study data will be deposited at the NIH-supported Data Management and Coordinating Center (DMCC) and the Database of Genotypes and Phenotypes (dbGaP) using procedures outlined by the Rare Disease Clinical Research Network (RDCRN) of the National Institutes of Health (NIH). As detailed in the patient consent process, “Only researchers with an approved study may be able to see and use your information…. Only de-identified data, which does not include anything that might directly identify you, will be shared with study investigators and approved investigators from the general scientific community for research purposes.” If you would like to access this data, please contact the CReATe Consortium at ProjectCReATe{at}miami.edu for a data request form. De-identified raw T1-weighted MRI and voxelwise cortical thickness images will be made available to researchers through an approved request pending review by the Penn Neurodegenerative Data Sharing Committee. To request access please complete the following online data request form: [https://www.pennbindlab.com/data-sharing](https://www.pennbindlab.com/data-sharing). Neuropathological data and associated data fields have been deposited along with all associated statistical code in an online repository ([https://github.com/pennbindlab/PolygenicALSCognitive](https://github.com/pennbindlab/PolygenicALSCognitive)). ## Data Availability All R software code generated to perform the reported analyses has been deposited online (https://github.com/pennbindlab/PolygenicALSCognitive). Please review the associated README file for details of data access. Briefly, associated datasets can be obtained as follows: The Clinical Research in ALS and Related Disorders for Therapeutic Development (CReATe) Consortium Phenotype-Genotype Biomarker (PGB) Study data will be deposited at the NIH-supported Data Management and Coordinating Center (DMCC) and the Database of Genotypes and Phenotypes (dbGaP) using procedures outlined by the Rare Disease Clinical Research Network (RDCRN) of the National Institutes of Health (NIH). As detailed in the patient consent process, only researchers with an approved study may be able to see and use patient information and only de-identified data, which does not include anything that might directly identify patients, will be shared with study investigators and approved investigators from the general scientific community for research purposes. If you would like to access this data, please contact the CReATe Consortium at ProjectCReATe@miami.edu for a data request form. De-identified raw T1-weighted MRI and voxelwise cortical thickness images will be made available to researchers through an approved request pending review by the Penn Neurodegenerative Data Sharing Committee. To request access please complete the following online data request form: https://www.pennbindlab.com/data-sharing. Neuropathological data and associated data fields have been deposited along with all associated statistical code in an online repository (https://github.com/pennbindlab/PolygenicALSCognitive). [https://github.com/pennbindlab/PolygenicALSCognitive](https://github.com/pennbindlab/PolygenicALSCognitive) ## Author Contributions Study concept/design: K.P., M.B., C.T.M. Acquisition, analysis, or interpretation of data: K.P., M.B., J.W., E.R., L.H., V.V.D., D.J.I., L.E., L.M., C.Q., V.G., J.S., T.B., J.R., A.S., J.K., E.P., C.J., J.C., Y.S., S.M., D.W., E.B.L., J.Q.T., P.C., J.G., J.S., A.C.N., R.R., G.W., J.P.T., and C.T.M. Drafting/revising manuscript: K.P., M.B., C.T.M., M.G., E.B.L., D.W., V.G., A.N., E.R., G.W., W.C., J.S., T.B. ## Competing Interests statement The following authors declare the following competing interests: C. T.M. receives financial support from Biogen and has provided consulting for Axon Advisors. M.B. reports grants from National Institutes of Health, the ALS Association, the Muscular Dystrophy Association, the Centers for Disease Control and Prevention, the Department of Defense, and Target ALS during the conduct of the study; personal fees from Mitsubishi Tanabe Pharma, AveXis, Prilenia, Genentech, and Roche outside the submitted work. In addition, M.B. has a provisional patent entitled ‘Determining Onset of Amyotrophic Lateral Sclerosis,’ and serves as a site investigator on clinical trials funded by Biogen and Orphazyme. All other authors declare no competing interests. Supplementary Figure 1. Principal Components in the CReATe PGB cohort. Scatterplot showing the first principal component plotted against the second principal component from a principal components analysis conducted in the CReATe PGB cohort. Supplementary Figure 2. Gridsearch for sCCA L1 parameters. Each column indicates 1 of 100 unique combinations of L1 parameters (ranging 0.1 to 1) applied to clinical and genetic datasets, and each row lists a variable entered into the sCCA. The heatmap denotes the canonical weight strength for each variable; warmer colors indicate positive weights and cooler colors indicate negative weights. Supplementary Figure 3. Bootstrapped sCCA modeling. Each column indicates 1 of 10,000 iterations of sCCA in each iteration a randomly-bootstrapped subsample of 75% of participants in the CReATe PGB cohort was employed. Each row lists a variable entered into the sCCA. The heatmap denotes the canonical weight strength for each variable; warmer colors indicate positive weights and cooler colors indicate negative weights. Supplementary Figure 4. *p* value calculation for sCCA modeling. Histogram showing the frequency of canonical correlations achieved sCCA modeling under randomly permuted data hypothesis. The vertical turquoise line denotes the median canonical correlation achieved under true sCCA modeling, and the *p* value demonstrates the proportion of times the median canonical correlation under true modeling was achieved by sCCA modeling under the null hypothesis. Supplementary Figure 5. Variables selected in sCCA modeling. A) Bar graphs demonstrating the proportion of times out of 10,000 iterations that each of the 11 clinical variables were selected by sCCA under true modeling (turquoise) and modeling under the null hypothesis (coral). B) Bar graphs demonstrating the number of times out of 10,000 randomly-bootstrapped sCCAs that each of the 45 SNPs were selected by sCCA under true modeling (turquoise) and modeling under the null hypothesis (coral). SNPs are organized according to prior genome-wide association with ALS or joint association with ALS and FTD. Supplementary Figure 6. Variables selected in sCCA modeling excluding patients with primary lateral sclerosis and progressive muscular atrophy. A) Bar graphs demonstrating the proportion of times out of 10,000 iterations that each of the 11 clinical variables were selected by sCCA under true modeling (turquoise) and modeling under the null hypothesis (coral). B) Bar graphs demonstrating the number of times out of 10,000 randomly-bootstrapped sCCAs that each of the 45 SNPs were selected by sCCA under true modeling (turquoise) and modeling under the null hypothesis (coral). SNPs are organized according to prior genome-wide association with ALS or joint association with ALS and FTD. Supplementary Figure 7. wPRS calculated with and without variants in high LD. Scatterplot depicting wPRS calculated using all genetic variables relative to wPRS calculated excluding SNPs in high linkage disequilibrium. Supplementary Figure 8. Univariate SNP associations with clinical variables. Add description. Fixed effects of each SNP on each clinical measure from linear mixed effects modeling: A) Heatmap of beta weights associated with the fixed effect of each SNP with warmer colors representing positive values and cooler colors representing negative values, and B) Heatmap of the corresponding *p* value associated with the beta weights in A, with brighter colors representing smaller values and darker colors representing larger values. Differences in baseline performance and rate of decline on each clinical measure for each participant; the heatmap indicates each participant’s standard deviation (SD) from the group mean. B) Spearman’s correlations between baseline performance and rate of decline for all clinical measures. Supplementary Figure 9. Reduced cortical thickness in ALS patients relative to healthy controls. ALS patients from the UPenn Biobank neuroimaging cohort displayed widespread cortical thinning relative to age, sex, and education-matched healthy controls in the frontal and temporal lobes. The heatmap indicates the associated T-statistic for each voxel, with light yellow representing the highest value. Supplementary Figure 10. Magnitude of neuronal loss in ALS patients relative to wPRS. Beeswarm boxplots of ordinal measures of neuronal loss in ALS cases from the UPenn Biobank autopsy cohort relative to wPRS in the cingulate cortex, motor cortex, middle frontal cortex, superior / middle temporal cortex, and hippocampus. Supplementary Figure 11. Magnitude of TDP-43 pathology in ALS patients relative to wPRS. Beeswarm boxplots of ordinal measures of TDP-43 pathology in ALS cases the UPenn Biobank autopsy cohort relative to wPRS in the cingulate cortex, motor cortex, middle frontal cortex, superior / middle temporal cortex, and hippocampus. ## Supplementary Table 1 View this table: [Supplementary Table 1:](http://medrxiv.org/content/early/2020/09/16/2019.12.23.19014407/T3) Supplementary Table 1: List of genetic variants analyzed in the CReATe PGB Study. View this table: [Supplementary Table 2:](http://medrxiv.org/content/early/2020/09/16/2019.12.23.19014407/T4) Supplementary Table 2: Linkage disequilibrium among studied single nucleotide polymorphisms. View this table: [Supplementary Table 3:](http://medrxiv.org/content/early/2020/09/16/2019.12.23.19014407/T5) Supplementary Table 3: Peak voxel coordinates for regions of reduced cortical thickness in ALS patients relative to healthy controls, and peak voxel coordinates for regions of reduced cortical thickness associated with higher weighted polygenic score (wPRS) in patients with ALS from the UPenn Biobank neuroimaging cohort. View this table: [Supplementary Table 4:](http://medrxiv.org/content/early/2020/09/16/2019.12.23.19014407/T6) Supplementary Table 4: Number of UPenn Biobank ALS autopsy cases for each neuropathological measurement in each sampled neuroanatomical region. ## Acknowledgements The CReATe Consortium (U54NS092091) is part of the Rare Diseases Clinical Research Network (RDCRN), an initiative of the National Center for Advancing Translational Sciences (NCATS) Office of Rare Diseases Research (ORDR). Additional research support was provided by the National Institutes of Health (NS106754, AG017586, NS092091, AG054060). The genomics sequencing was funded by St. Jude Children’s Research Hospital American Lebanese Syrian Associated Charities (ALSAC), with additional support from the ALS Association for biorepository and sequencing costs (grants 17-LGCA-331 and 16-TACL-242). * Received December 23, 2019. * Revision received September 15, 2020. * Accepted September 16, 2020. * © 2020, Posted by Cold Spring Harbor Laboratory The copyright holder for this pre-print is the author. All rights reserved. The material may not be redistributed, re-used or adapted without the author's permission. ## References 1. 1.Montuschi A, lazzolino B, Calvo A, Moglia C, Lopiano L, Restagno G, Brunetti M, Ossola I, Presti Lo A, Cammarosano S, et al. (2015) Cognitive correlates in amyotrophic lateral sclerosis: a population-based study in Italy. Journal of Neurology, Neurosurgery & Psychiatry 86: 168–173. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoiam5ucCI7czo1OiJyZXNpZCI7czo4OiI4Ni8yLzE2OCI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIwLzA5LzE2LzIwMTkuMTIuMjMuMTkwMTQ0MDcuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 2. 2.Beeldman E, Raaphorst J, Twennaar MK, de Visser M, Ben A Schmand, de Haan RJ (2016) The cognitive profile of ALS: a systematic review and meta-analysis update. Journal of Neurology, Neurosurgery & Psychiatry 87: 611–619. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoiam5ucCI7czo1OiJyZXNpZCI7czo4OiI4Ny82LzYxMSI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIwLzA5LzE2LzIwMTkuMTIuMjMuMTkwMTQ0MDcuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 3. 3.Elamin M, Bede P, Byrne S, Jordan N, Gallagher L, Wynne B, O’Brien C, Phukan J, Lynch C, Pender N, et al. (2013) Cognitive changes predict functional decline in ALS: A population-based longitudinal study. Neurology 80: 1590–1597. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1212/WNL.0b013e31828f18ac&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23553481&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F16%2F2019.12.23.19014407.atom) 4. 4.Crockford C, Newton J, Lonergan K, Chiwera T, Booth T, Chandran S, Colville S, Heverin M, Mays I, Pal S, et al. (2018) ALS-specific cognitive and behavior changes associated with advancing disease stage in ALS. Neurology 91: e1370–e1380. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F16%2F2019.12.23.19014407.atom) 5. 5.Hu WT, Shelnutt M, Wilson A, Yarab N, Kelly C, Grossman M, Libon DJ, Khan J, Lah JJ, Levey Al, et al. (2013) Behavior Matters—Cognitive Predictors of Survival in Amyotrophic Lateral Sclerosis. PLoS ONE 8: e57584. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23460879&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F16%2F2019.12.23.19014407.atom) 6. 6.Caga J, Hsieh S, Lillo P, Dudley K, Mioshi E (2019) The Impact of Cognitive and Behavioral Symptoms on ALS Patients and Their Caregivers. Front Neurol 10: 942. 7. 7.DeJesus-Hernandez M, Mackenzie IR, Boeve BF, Boxer AL, Baker M, Rutherford NJ, Nicholson AM, Finch NA, Flynn H, Adamson J, et al. (2011) Expanded GGGGCC Hexanucleotide Repeat in Noncoding Region of C9ORF72 Causes Chromosome 9p-Linked FTD and ALS. Neuron 72: 245–256. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.neuron.2011.09.011&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21944778&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F16%2F2019.12.23.19014407.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000296224000008&link_type=ISI) 8. 8.Renton AE, Majounie E, Waite A, Simon-Sanchez J, Rollinson S, Gibbs JR, Schymick JC, Laaksovirta H, Van Swieten JC, Myllykangas L, et al. (2011) A hexanucleotide repeat expansion in C9ORF72 is the cause of chromosome 9p21-linked ALS-FTD. Neuron 72: 257–268. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.neuron.2011.09.010&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21944779&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F16%2F2019.12.23.19014407.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000296224000009&link_type=ISI) 9. 9.Van Deerlin VM, Leverenz JB, Bekris LM, Bird TD, Yuan W, Elman LB, Clay D, Wood EM, Chen-Plotkin AS, Martinez-Lage M, et al. (2008) TARDBP mutations in amyotrophic lateral sclerosis with TDP-43 neuropathology: a genetic and histopathological analysis. The Lancet Neurology 7: 409–416. 10. 10.Freischmidt A, Wieland T, Richter B, Ruf W, Schaeffer V, Müller K, Marroquin N, Nordin F, Hübers A, Weydt P, et al. (2015) Haploinsufficiency of *TBK1* causes familial ALS and fronto-temporal dementia. Nat Neurosci 18: 631–636. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nn.4000&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25803835&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F16%2F2019.12.23.19014407.atom) 11. 11.van Rheenen W, Shatunov A, Dekker AM, McLaughlin RL, Diekstra FP, Pulit SL, van der Spek RAA, Võsa U, de Jong S, Robinson MR, et al. (2016) Genome-wide association analyses identify new risk variants and the genetic architecture of amyotrophic lateral sclerosis. Nat Genet 48: 1043–1048. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.3622&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27455348&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F16%2F2019.12.23.19014407.atom) 12. 12.Nicolas A, Kenna KP, Renton AE, Faghri F, Chia R, Dominov JA, Kenna BJ, Nalls MA, Keagle P, Rivera AM, et al. (2018) Genome-wide Analyses Identify KIF5A as a Novel ALS Gene. Neuron 97: 1268–1282.e6. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.neuron.2018.02.027&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F16%2F2019.12.23.19014407.atom) 13. 13.van Es MA, Veldink JH, Saris CGJ, Blauw HM, van Vught PWJ, Birve A, Lemmens R, Schelhaas HJ, Groen EJN, Huisman MHB, et al. (2009) Genome-wide association study identifies 19p13.3 (UNC13A) and 9p21.2 as susceptibility loci for sporadic amyotrophic lateral sclerosis. Nat Genet 41: 1083–1087. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.442&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=19734901&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F16%2F2019.12.23.19014407.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000270330400010&link_type=ISI) 14. 14.Diekstra FP, Van Deerlin VM, Van Swieten JC, Al-Chalabi A, Ludolph AC, Weishaupt JH, Hardiman O, Landers JE, Brown RH, van Es MA, et al. (2014) C9orf72 and UNC13A are shared risk loci for amyotrophic lateral sclerosis and frontotemporal dementia: A genome wide meta-analysis. Ann Neurol 76: 120–133. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/ana.24198&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24931836&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F16%2F2019.12.23.19014407.atom) 15. 15.Karch CM, Wen N, Fan CC, Yokoyama JS, Kouri N, Ross OA, Höglinger G, Müller U, Ferrari R, Hardy J, et al. (2018) Selective Genetic Overlap Between Amyotrophic Lateral Sclerosis and Diseases of the Frontotemporal Dementia Spectrum. JAMA Neurol 75: 860–16. 16. 16.Turner MR, Al-Chalabi A, Chiò A, Hardiman O, Kiernan MC, Rohrer JD, Rowe J, Seeley W, Talbot K (2017) Genetic screening in sporadic ALS and FTD. Journal of Neurology, Neurosurgery & Psychiatry 88: 1042–1044. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoiam5ucCI7czo1OiJyZXNpZCI7czoxMDoiODgvMTIvMTA0MiI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIwLzA5LzE2LzIwMTkuMTIuMjMuMTkwMTQ0MDcuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 17. 17.Van Deerlin VM, Leverenz JB, Bekris LM, Bird TD, Yuan W, Elman LB, Clay D, Wood EM, Chen-Plotkin AS, Martinez-Lage M, et al. (2008) TARDBP mutations in amyotrophic lateral sclerosis with TDP-43 neuropathology: a genetic and histopathological analysis. The Lancet Neurology 7: 409–416. 18. 18.Vance C, Rogelj B, HortobáGyi T, De Vos KJ, Nishimura AL, Sreedharan J, Hu X, Smith B, Ruddy D, Wright P, et al. (2009) Mutations in FUS, an RNA Processing Protein, Cause Familial Amyotrophic Lateral Sclerosis Type 6. Science 323: 1208–1211. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEzOiIzMjMvNTkxOC8xMjA4IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjAvMDkvMTYvMjAxOS4xMi4yMy4xOTAxNDQwNy5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 19. 19.Kenna KP, van Doormaal PTC, Dekker AM, Ticozzi N, Kenna BJ, Diekstra FP, van Rheenen W, van Eijk KR, Jones AR, Keagle P, et al. (2016) NEK1 variants confer susceptibility to amyotrophic lateral sclerosis. Nat Genet 48: 1037–1042. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.3626&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27455347&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F16%2F2019.12.23.19014407.atom) 20. 20.Rosen DR, Siddique T, Patterson D, Figlewicz DA, Sapp P, Hentati A, Donaldson D, Goto J, O’Regan JP, Deng H-X, et al. (1993) Mutations in Cu/Zn superoxide dismutase gene are associated with familial amyotrophic lateral sclerosis. Nature 362: 59–62. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/362059a0&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=8446170&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F16%2F2019.12.23.19014407.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1993KP97600059&link_type=ISI) 21. 21.Umoh ME, Fournier C, Li Y, Polak M, Shaw L, Landers JE, Hu W, Gearing M, Glass JD (2016) Comparative analysis of C9orf72 and sporadic disease in an ALS clinic population. Neurology 87: 1024–1030. 22. 22.McLaughlin RL, Schijven D, van Rheenen W, van Eijk KR, O’Brien M, Kahn RS, Ophoff RA, Goris A, Bradley DG, Al-Chalabi A, et al. (2017) Genetic correlation between amyotrophic lateral sclerosis and schizophrenia. Nat Commun 8: 14774. 23. 23.Ciga SB, Noyce AJ, Hemani G, Nicolas A, Calvo A, Mora G, Tienari PJ, Stone DJ, Nalls MA, Singleton AB, et al. (2019) Shared polygenic risk and causal inferences in amyotrophic lateral sclerosis. Ann Neurol 85: 470–481. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F16%2F2019.12.23.19014407.atom) 24. 24.Placek K, Baer GM, Elman L, McCluskey L, Hennessy L, Ferraro PM, Lee EB, Lee VMY, Trojanowski JQ, Van Deerlin VM, et al. (2019) UNC13A polymorphism contributes to frontotemporal disease in sporadic amyotrophic lateral sclerosis. Neurobiology of Aging 73: 190–199. 25. 25.Vass R, Ashbridge E, Geser F, Hu WT, Grossman M, Clay-Falcone D, Elman L, McCluskey L, Lee VMY, Van Deerlin VM, et al. (2011) Risk genotypes at TMEM106B are associated with cognitive impairment in amyotrophic lateral sclerosis. Acta Neuropathologica 121: 373–380. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/s00401-010-0782-y&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21104415&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F16%2F2019.12.23.19014407.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000287318000007&link_type=ISI) 26. 26.Hagenaars SP, Radakovic R, Crockford C, Fawns-Ritchie C, IFGC IF-GC, Harris SE, Gale CR, Deary IJ (2018) Genetic risk for neurodegenerative disorders, and its overlap with cognitive ability and physical function. PLoS ONE 13: e0198187. 27. 27.Witten DM, Tibshirani R, Hastie T (2009) A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10: 515–534. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/biostatistics/kxp008&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=19377034&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F16%2F2019.12.23.19014407.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000267213700010&link_type=ISI) 28. 28.Witten DM, Tibshirani RJ (2009) Extensions of sparse canonical correlation analysis with applications to genomic data. Stat Appl Genet Mol Biol 8: Article28. 29. 29.Parkhomenko E, Tritchler D, Beyene J (2009) Sparse canonical correlation analysis with application to genomic data integration. Stat Appl Genet Mol Biol 8: Article1–Article34. 30. 30.Avants BB, Libon DJ, Rascovsky K, Boller A, McMillan CT, Massimo L, Coslett HB, Chatterjee A, Gross RG, Grossman M (2014) Sparse canonical correlation analysis relates network-level atrophy to multivariate cognitive measures in a neurodegenerative population. NeuroImage 84: 698–711. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24096125&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F16%2F2019.12.23.19014407.atom) 31. 31.Avants BB, Cook PA, Ungar L, Gee JC, Grossman M (2010) Dementia induces correlated reductions in white matter integrity and cortical thickness: a multivariate neuroimaging study with sparse canonical correlation analysis. NeuroImage 50: 1004–1016. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.neuroimage.2010.01.041&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20083207&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F16%2F2019.12.23.19014407.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000275408200015&link_type=ISI) 32. 32.Du L, Liu K, Zhu L, Yao X, Risacher SL, Guo L, Saykin AJ, Shen L, Alzheimer’s Disease Neuroimaging Initiative (2019) Identifying progressive imaging genetic patterns via multi-task sparse canonical correlation analysis: a longitudinal study of the ADNI cohort. Bioinformatics 35: i474–i483. 33. 33.Hao X, Li C, Yan J, Yao X, Risacher SL, Saykin AJ, Shen L, Zhang D, Alzheimer’s Disease Neuroimaging Initiative (2017) Identification of associations between genotypes and longitudinal phenotypes via temporally-constrained group sparse canonical correlation analysis. Bioinformatics 33: i341–i349. 34. 34.Hu W, Lin D, Cao S, Liu J, Chen J, Calhoun VD, Wang Y-P (2018) Adaptive Sparse Multiple Canonical Correlation Analysis With Application to Imaging (Epi)Genomics Study of Schizophrenia. IeEe Trans Biomed Eng 65: 390–399. 35. 35.McMillan CT, Toledo JB, Avants BB, Cook PA, Wood EM, Suh E, Irwin DJ, Powers J, Olm C, Elman L, et al. (2014) Genetic and neuroanatomic associations in sporadic frontotemporal lobar degeneration. Neurobiology of Aging 35: 1473–1482. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24373676&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F16%2F2019.12.23.19014407.atom) 36. 36.Toledo JB, Van Deerlin VM, Lee EB, Suh E, Baek Y, Robinson JL, Xie SX, McBride J, Wood EM, Schuck T, et al. (2014) A platform for discovery: The University of Pennsylvania Integrated Neurodegenerative Disease Biobank. Alzheimer’s & Dementia 10: 477–484.e1. 37. 37.Abrahams S, Newton J, Niven E, Foley J, Bak TH (2014) Screening for cognition and behaviour changes in ALS. Amyotrophic Lateral Sclerosis and Frontotemporal Degeneration 15: 9–14. 38. 38.Kim W-K, Liu X, Sandner J, Pasmantier M, Andrews J, Rowland LP, Mitsumoto H (2009) Study of 962 patients indicates progressive muscular atrophy is a form of ALS. Neurology 73: 1686–1692. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1212/WNL.0b013e3181c1dea3&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=19917992&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F16%2F2019.12.23.19014407.atom) 39. 39.de Vries BS, Rustemeijer LMM, Bakker LA, Schröder CD, Veldink JH, van den Berg LH, Nijboer TCW, van Es MA (2019) Cognitive and behavioural changes in PLS and PMA:challenging the concept of restricted phenotypes. Journal of Neurology, Neurosurgery & Psychiatry 90: 141–147. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoiam5ucCI7czo1OiJyZXNpZCI7czo4OiI5MC8yLzE0MSI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIwLzA5LzE2LzIwMTkuMTIuMjMuMTkwMTQ0MDcuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 40. 40.Lulé D, Böhm S, Müller H-P, Aho-Özhan H, Keller J, Gorges M, Loose M, Weishaupt JH, Uttner I, Pinkhardt E, et al. (2018) Cognitive phenotypes of sequential staging in amyotrophic lateral sclerosis. CORTEX 101: 163–171. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F16%2F2019.12.23.19014407.atom) 41. 41.Agosta F, Ferraro PM, Riva N, Spinelli EG, Chiò A, Canu E, Valsasina P, Lunetta C, Iannaccone S, Copetti M, et al. (2016) Structural brain correlates of cognitive and behavioral impairment in MND. Hum Brain Mapp 37: 1614–1626. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F16%2F2019.12.23.19014407.atom) 42. 42.Müller H-P, Kassubek J (2018) MRI-Based Mapping of Cerebral Propagation in Amyotrophic Lateral Sclerosis. Front Neurosci 12: 655. 43. 43.Byrne S, Elamin M, Bede P, Shatunov A, Walsh C, Corr B, Heverin M, Jordan N, Kenna K, Lynch C, et al. (2012) Cognitive and clinical characteristics of patients with amyotrophic lateral sclerosis carrying a C9orf72 repeat expansion: a population-based cohort study. The Lancet Neurology 11: 232–240. 44. 44.Irwin DJ, McMillan CT, Suh E, Powers J, Rascovsky K, Wood EM, Toledo JB, Arnold SE, Lee VMY, Van Deerlin VM, et al. (2014) Myelin oligodendrocyte basic protein and prognosis in behavioral-variant frontotemporal dementia. Neurology 83: 502–509. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1212/WNL.0000000000000668&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24994843&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F16%2F2019.12.23.19014407.atom) 45. 45.Burnett B, Li F, Pittman RN (2003) The polyglutamine neurodegenerative protein ataxin-3 binds polyubiquitylated proteins and has ubiquitin protease activity. Human Molecular Genetics 12: 3195–3205. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/hmg/ddg344&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=14559776&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F16%2F2019.12.23.19014407.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000186587100015&link_type=ISI) 46. 46.Yokoyama JS, Karch CM, Fan CC, Bonham LW, Kouri N, Ross OA, Rademakers R, Kim J, Wang Y, Höglinger GU, et al. (2017) Shared genetic risk between corticobasal degeneration, progressive supranuclear palsy, and frontotemporal dementia. Acta Neuropathol 133: 825–837. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/s00401-017-1693-y&link_type=DOI) 47. 47.Ferrari R, Wang Y, Vandrovcova J, Guelfi S, Witeolar A, Karch CM, Schork AJ, Fan CC, Brewer JB, International FTD-Genomics Consortium (IFGC),, et al. (2017) Genetic architecture of sporadic frontotemporal dementia and overlap with Alzheimer‘s and Parkinson’s diseases. Journal of Neurology, Neurosurgery & Psychiatry 88: 152–164. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoiam5ucCI7czo1OiJyZXNpZCI7czo4OiI4OC8yLzE1MiI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIwLzA5LzE2LzIwMTkuMTIuMjMuMTkwMTQ0MDcuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 48. 48.Desikan RS, Schork AJ, Wang Y, Witoelar A, Sharma M, McEvoy LK, Holland D, Brewer JB, Chen C-H, Thompson WK, et al. (2015) Genetic overlap between Alzheimer‘s disease and Parkinson’s disease at the MAPT locus. Mol Psychiatry 20: 1588–1595. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/mp.2015.6&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25687773&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F16%2F2019.12.23.19014407.atom) 49. 49.Irwin DJ, McMillan CT, Xie SX, Rascovsky K, Van Deerlin VM, Coslett HB, Hamilton R, Aguirre GK, Lee EB, Lee VMY, et al. (2018) Asymmetry of post-mortem neuropathology in behavioural-variant frontotemporal dementia. Brain 141: 288–301. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/brain/awx319&link_type=DOI) 50. 50.Giannini LAA, Xie SX, McMillan CT, Liang M, Williams A, Jester C, Rascovsky K, Wolk DA, Ash S, Lee EB, et al. (2019) Divergent patterns of TDP-43 and tau pathologies in primary progressive aphasia. Ann Neurol 85: 630–643. 51. 51.Irwin DJ, Brettschneider J, McMillan CT, Cooper F, Olm C, Arnold SE, Van Deerlin VM, Seeley WW, Miller BL, Lee EB, et al. (2016) Deep clinical and neuropathological phenotyping of Pick disease. Ann Neurol 79: 272–287. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/ana.24559&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26583316&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F16%2F2019.12.23.19014407.atom) 52. 52.Irwin DJ, Byrne MD, McMillan CT, Cooper F, Arnold SE, Lee EB, Van Deerlin VM, Xie SX, Lee VMY, Grossman M, et al. (2015) Semi-Automated Digital Image Analysis of Pick’s Disease and TDP-43 Proteinopathy:. Journal of Histochemistry & Cytochemistry 64: 54–66. 53. 53.Prudlo J, König J, Schuster C, Kasper E, Büttner A, Teipel S, Neumann M (2016) TDP-43 pathology and cognition in ALS: A prospective clinicopathologic correlation study. Neurology 87: 1019–1023. 54. 54.Sugrue LP, Desikan RS (2019) What Are Polygenic Scores and Why Are They Important? JAMA 321: 1820–1821. 55. 55.Wald NJ, Old R (2019) The illusion of polygenic disease risk prediction. Genetics in Medicine 2019 319: 1. 56. 56.Duncan L, Shen H, Gelaye B, Meijsen J, Ressler K, Feldman M, Peterson R, Domingue B (2019) Analysis of polygenic risk score usage and performance in diverse human populations. Nat Commun 10: 3328–3329. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F16%2F2019.12.23.19014407.atom) 57. 57.Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38: 904–909. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng1847&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=16862161&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F16%2F2019.12.23.19014407.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000239325700019&link_type=ISI) 58. 58.Lillo P, Mioshi E, Zoing MC, Kiernan MC, Hodges JR (2010) How common are behavioural changes in amyotrophic lateral sclerosis? Amyotrophic Lateral Sclerosis 12: 45–51. 59. 59.Simon N, Goldstein LH (2019) Screening for cognitive and behavioral change in amyotrophic lateral sclerosis/motor neuron disease: a systematic review of validated screening methods. Amyotrophic Lateral Sclerosis and Frontotemporal Degeneration 20: 1–11. 60. 60.Cedarbaum JM, Stambler N, Malta E, Fuller C, Hilt D, Thurmond B, Nakanishi A (1999) The ALSFRS-R: a revised ALS functional rating scale that incorporates assessments of respiratory function. Journal of the Neurological Sciences 169: 13–21. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S0022-510X(99)00210-5&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=10540002&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F16%2F2019.12.23.19014407.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000083655500003&link_type=ISI) 61. 61.Strong MJ, Abrahams S, Goldstein LH, Woolley S, Mclaughlin P, Snowden J, Mioshi E, Roberts-South A, Benatar M, HortobáGyi T, et al. (2017) Amyotrophic lateral sclerosis - frontotemporal spectrum disorder (ALS-FTSD): Revised diagnostic criteria. Amyotrophic Lateral Sclerosis and Frontotemporal Degeneration 18: 153–174. 62. 62.Li H, Durbin R (2010) Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26: 589–595. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/btp698&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20080505&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F16%2F2019.12.23.19014407.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000274973800001&link_type=ISI) 63. 63.McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, et al. (2010) The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20: 1297–1303. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NjoiZ2Vub21lIjtzOjU6InJlc2lkIjtzOjk6IjIwLzkvMTI5NyI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIwLzA5LzE2LzIwMTkuMTIuMjMuMTkwMTQ0MDcuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 64. 64.Hunt SE, McLaren W, Gil L, Thormann A, Schuilenburg H, Sheppard D, Parton A, Armean IM, Trevanion SJ, Flicek P, et al. (2018) Ensembl variation resources. Database (Oxford) 2018: 1193. 65. 65.Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alföldi J, Wang Q, Collins RL, Laricchia KM, Ganna A, Birnbaum DP, et al. (2019) Variation across 141,456 human exomes and genomes reveals the spectrum of loss-of-function intolerance across human protein-coding genes. bioRxiv 49: 531210. 66. 66.Tan A, Abecasis GR, Kang HM (2015) Unified representation of genetic variants. Bioinformatics 31: 2202–2204. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/btv112&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25701572&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F16%2F2019.12.23.19014407.atom) 67. 67.Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, de Bakker PIW, Daly MJ, et al. (2007) PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses. The American Journal of Human Genetics 81: 559–575. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1086/519795&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=17701901&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F16%2F2019.12.23.19014407.atom) 68. 68.Xia CH, Ma Z, Ciric R, Gu S, Betzel RF, Kaczkurkin AN, Calkins ME, Cook PA, la Garza de AG, Vandekar SN, et al. (2018) Linked dimensions of psychopathology and connectivity in functional brain networks. Nat Commun 1–14. 69. 69.Brooks BR, Miller RG, Swash M, Munsat TL, World Federation of Neurology Research Group on Motor Neuron Diseases (2000) El Escorial revisited: revised criteria for the diagnosis of amyotrophic lateral sclerosis. In pp 293–299. 70. 70.Neumann M, Sampathu DM, Kwong LK, Truax AC, Micsenyi MC, Chou TT, Bruce J, Schuck T, Grossman M, Clark CM, et al. (2006) Ubiquitinated TDP-43 in Frontotemporal Lobar Degeneration and Amyotrophic Lateral Sclerosis. Science 314: 130–133. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEyOiIzMTQvNTc5Ni8xMzAiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMC8wOS8xNi8yMDE5LjEyLjIzLjE5MDE0NDA3LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 71. 71.Mackenzie IRA, Neumann M, Baborie A, Sampathu DM, Plessis Du D, Jaros E, Perry RH, Trojanowski JQ, Mann DMA, Lee VMY (2011) A harmonized classification system for FTLD-TDP pathology. Acta Neuropathol 122: 111–113. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/s00401-011-0845-8&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21644037&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F16%2F2019.12.23.19014407.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000291993300010&link_type=ISI) 72. 72.Suh E, Lee EB, Neal D, Wood EM, Toledo JB, Rennert L, Irwin DJ, McMillan CT, Krock B, Elman LB, et al. (2015) Semi-automated quantification of C9orf72 expansion size reveals inverse correlation between hexanucleotide repeat number and disease duration in frontotemporal degeneration. Acta Neuropathologica 130: 363–372. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F16%2F2019.12.23.19014407.atom) 73. 73.Wood EM, Falcone D, Suh E, Irwin DJ, Chen-Plotkin AS, Lee EB, Xie SX, Van Deerlin VM, Grossman M (2013) Development and Validation of Pedigree Classification Criteria for Frontotemporal Lobar Degeneration. JAMA Neurol 70: 1411–1417. 74. 74.McCarthy S, Das S, Kretzschmar W, Delaneau O, Wood AR, Teumer A, Kang HM, Fuchsberger C, Danecek P, Sharp K, et al. (2016) A reference panel of 64,976 haplotypes for genotype imputation. Nat Genet 48: 1279–1283. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.3643&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27548312&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F16%2F2019.12.23.19014407.atom) 75. 75.Das S, Forer L, Schönherr S, Sidore C, Locke AE, Kwong A, Vrieze SI, Chew EY, Levy S, McGue M, et al. (2016) Next-generation genotype imputation service and methods. Nat Genet 48: 1284–1287. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.3656&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27571263&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F16%2F2019.12.23.19014407.atom) 76. 76.Tustison NJ, Cook PA, Klein A, Song G, Das SR, Duda JT, Kandel BM, van Strien N, Stone JR, Gee JC, et al. (2014) Large-scale evaluation of ANTs and FreeSurfer cortical thickness measurements. NeuroImage 99: 166–179. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.neuroimage.2014.05.044&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24879923&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F16%2F2019.12.23.19014407.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000339860000018&link_type=ISI) 77. 77.Tustison NJ, Avants BB, Cook PA, Zheng Y, Egan A, Yushkevich PA, Gee JC (2010) N4ITK: Improved N3 Bias Correction. IEEE Trans Med Imaging 29: 1310–1320. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1109/TMI.2010.2046908&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20378467&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F16%2F2019.12.23.19014407.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000278535800009&link_type=ISI) 78. 78.Winkler AM, Ridgway GR, Webster MA, Smith SM, Nichols TE (2014) Permutation inference for the general linear model. NeuroImage 92: 381–397. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.neuroimage.2014.01.060&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24530839&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F16%2F2019.12.23.19014407.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000335713000035&link_type=ISI) 79. 79.Smith SM, Nichols TE (2009) Threshold-free cluster enhancement: addressing problems of smoothing, threshold dependence and localisation in cluster inference. NeuroImage 44: 83–98. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.neuroimage.2008.03.061&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=18501637&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F16%2F2019.12.23.19014407.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000262300900010&link_type=ISI) 80. 80.Lippa CF, Rosso AL, Stutzbach LD, Neumann M, Lee VMY, Trojanowski JQ (2009) Transactive Response DNA-Binding Protein 43 Burden in Familial Alzheimer Disease and Down Syndrome. Arch Neurol 66: 1483–1488. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1001/archneurol.2009.277&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20008652&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F16%2F2019.12.23.19014407.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000272554200007&link_type=ISI) 81. 81.Neumann M, Kwong LK, Lee EB, Kremmer E, Flatley A, Xu Y, Forman MS, Troost D, Kretzschmar HA, Trojanowski JQ, et al. (2009) Phosphorylation of S409/410 of TDP-43 is a consistent feature in all sporadic and familial forms of TDP-43 proteinopathies. Acta Neuropathologica 117: 137–149. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/s00401-008-0477-9&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=19125255&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F16%2F2019.12.23.19014407.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000262784500004&link_type=ISI)