Abstract
Background Suicide is the leading cause of death in youth worldwide.1 Identifying children with high risk for suicide remains challenging.2 Here we test the extents to which genome-wide polygenic scores (GPS) for common traits and psychiatric disorders are linked to the risk for suicide in young children.
Methods We constructed GPSs of 24 traits and psychiatric disorders broadly related to suicidality from 8,212 US children with ages of 9 to 10 from the Adolescent Brain Cognitive Development study. We performed multiple logistic regression to test the association between childhood suicidality, defined as suicidal ideation or suicidal attempt, and the GPSs. Machine learning techniques were used to test the predictive utility of the GPSs and other phenotypic outcomes on suicide and suicidal behaviors in the youth.
Outcomes We identified three GPSs significantly associated with childhood suicidality: Attention deficit hyperactivity disorder (ADHD) (P = 2.83×10−4; odds ratio (OR) = 1.12, FDR correction), general happiness with belief that own life is meaningful (P = 1.30×10−3; OR = 0.89) and autism spectrum disorder (ASD) (P = 1.81×10−3; OR = 1.14). Furthermore, the ASD GPS showed significant interaction with ELS such that a greater polygenic score in the presence of a greater ELS has even greater likelihood of suicidality (with active suicidal ideation, P = 1.39×10−2, OR = 1.11). In machine learning predictions, the cross validated and optimized model showed an ROC-AUC of 0.72 and accuracy of 0.756 for the hold-out set of overall suicidal ideation prediction, and showed an ROC-AUC of 0.765 and accuracy of 0.750 for the hold-out set of suicidal attempts.
Interpretation Our results show that childhood suicidality is linked to the GPSs for psychiatric disorders, ADHD and ASD, and for a common trait, general happiness, respectively; and that GPSs for ASD and insomnia, respectively, have synergistic effects on suicidality via an interaction with early life stress. By providing the quantitative account of the polygenic and environmental factors of childhood suicidality in a large, representative population, this study shows the potential utility of the GPS in investigation of childhood suicidality for early screening, intervention, and prevention.
Introduction
In 2020, in every 10 minutes someone in the U.S. died by suicide. In youth, suicide is the leading cause of death worldwide.1 Identifying children with high risk for suicide remains challenging with poor predictive validity.2 With high estimated heritability of 30% to 55%, suicide is polygenic.3 Although recent Genome-Wide Association Studies (GWAS) revealed several genome-wide significant loci,4,5 they are not adequtely powered, remain to be replicatedand have yet to offer risk prediction.
Another key outstanding question regarding childhood suicidality is the existence of the gene-by-environment interaction. Early life stress (ELS) or childhood adversity contributes to psychopathology6 and suicidality7 through the epigenetic mechanism. In fight with childhood suicidality, testing whether ELS and genetic factors together act synergistically on childhood suicidality will provide an insight into the biological pathway, as well as an actionable target for screening and intervention. No literature reports this.
The capability of risk stratification for suicide using genetics and interacting environmental variables would thus help prevent youth suicide. To this end, we test the extents to which genome-wide polygenic scores8,9 for common traits and psychiatric disorders, and their interaction with ELS is linked to the risk for suicide in young children.
Methods
Study design and participants
We used data from the ABCD (Adolescent Brain and Cognitive Development) study, the nationwide multisite prospective, longitudinal study. The enrolled samples are 11,878 children with ages of 9 to 10 years old in the US across 21 sites recruited between 2015 and 2019. Ethical approval for the study was obtained from Seoul National University IRB. Full details of the study, measures, and samples can be found elsewhere.10
Genotype data
The saliva samples of the participants were collected and genotyped at Rutgers University Cell and DNA Repository using Affymetrix SmokeScreen Array consisting of 733,293 single nucleotide polymorphisms (SNPs). After removing the SNP with genotype call rate <95%, sample call rate <95%, and minor allele frequency (MAF) <1%, raw genotypes were imputed toward 1000G Phase5 reference panel using the Michigan Imputation Server and phased with Eagle v2.4. Additional quality control (QC) process removed SNPs with genotype call rate <95%, Hardy-Weinberg Equilibrium p-value <1E-06, sample missingness >5%, and MAF <1%. We also removed samples with extreme heterozygosity having F coefficient bigger than 3 standard deviation of the population mean. We conducted Principal Component Analysis (PCA) on genetic variants that are pruned for variants in linkage disequilibrium with an r2 > 0.25 in a 200kb window. Since ABCD participants reside on a continuum of genetic ancestry as American population (1000Genome super population), rather than distinct population groups, and none of them belongs to either African or East Asian populations, we did not exclude any genetic outliers based on PCA biplot (Supplementary Figure). Proportional identify-by-descent analysis was performed with the same SNP pruning using PLINK11 and we removed moderately related individuals (pihat > 0.18). After QC, genotype data were available for 8,496 children.
Construction of Genome-wide Polygenic Scores (GPS)
For GPS generation, we selected 24 target outcomes that are thought to be related to suicidality. These include personality, cognitive, psychological traits, and psychiatric disorders that are known to be broadly related to suicidality: general happiness,12,13 insomnia,14 depression,15 risk behaviors,16 risk tolerance, educational attainment,17,18 cognitive performance,16,17 snoring,19 worry,20 IQ,17 behaviors: cannabis usage,21,22 drink per week,23 smoker,24 Attention deficit hyperactivity disorder (ADHD),25 Autism spectrum disorder (ASD),26 major depressive disorder (MDD),27 schizophrenia,28 bipolar disorder,29 post-traumatic stress disorder (PTSD).30 For general happiness, four kinds of GPS were built and tested for the study: the participants were asked with different mental health questionnaires for measuring the level of subjective well-being, such as (1) two different GWAS of “how happy are you in general”, (2) “how happy are you with your health in general”, and (3) “to what extent do you feel your life to be meaningful”. All the GWAS summary statistics of the aforementioned outcomes are publically available and have been collected for GPS generation.
We performed clumping and pruning of SNPs using Polygenic Risk Score software 231 with a clumping window of 500 kb, clumping r2 of 0.2, and no thresholding of p-value significance on the summary statistics since we wanted to fully incorporate the effects of all the SNPs. The polygenic score for each individual was then computed as a sum of their SNPs adjusting for first 10 PCs, with each SNP being weighted by the effect in the discovery samples.31
Outcomes
Suicidal ideation and attempt was derived from the computerized version of Kiddie Schedule for Affective Disorders and Schizophrenia (KSADS).32,33 Besides the GPSs and the suicidality outcome variable, the following variables were considered for covariates in logistic regression or the base model for the classification task. For psychopathology, ABCD Parent Child Behavior Checklist; for intelligence, NIH toolbox;34 for family environment, Youth Family Environment Scale; and for early life stress, physical and sexual abuse, household challenges such as family substance abuse, family mental illness, family criminals, parental separation or divorce, and violently treated mothers and emotional and physical neglect (a total composite score was used). Demographic variables include sex, age, race/ethnicity, study site, years of parental education, and parental marital status.
Statistical Analysis
Associations of the GPSs with suicidal variables within the ABCD children were tested using logistic regression with the following covariates: sex, age, site for sample collection, and maternal education level. After QC and GPS construction, the complete dataset of phenotypic outcomes, GPS, and covariate data were available for 8,212 children and used for statistical analysis. To test the extents to which the genome-wide polygenic scores are useful to predict the suicidal risk in children, we trained stacked ensemble models or super learners35 based on the several machine learning models, such as lightGBM, xgboost, general linear model (Driverless AI version 1.8, H2O.ai Inc., CA, USA),36 and, as a benchmark to the stacked ensemble model, Gradient Boosting Machine (Scikit-learn 0.21; https://scikit-learn.org/). The input variables included the GPSs, sociodemographic, self-reported, and behavioral variables (e.g., psychopathology, cognitive, early life stress). We rationalized that the multitude of the genetic profile would account for the multi-dimensional risk for suicide. For model training and evaluation, we split the data into 80% and 20% in which samples were balanced (bootstrapping without replacement). Within the 80% training data, we conducted five-fold stratified cross validation and optimized hyperparameters in cross validation sets.
Results
Children with suicidal ideation or attempts showed similar sociodemographic, behavioral, clinical characteristics to those without suicidal ideation or attempts (Table 1). Out of the pre-selected 24 GPSs, in suicidal ideation, a greater GPS of ADHD correlated with a greater likelihood (P = 2.83×10−4, odds ratio (OR) = 1.12; FDR correction). Likewise, the ADHD GPS significantly correlated with active (P= 9.75×10−4, OR=1.14) and passive suicidal ideation (P=6.63×10−4, OR=1.13), respectively (Table 2). In active suicidal ideation, a greater GPS of ASD correlated with a greater likelihood of suicidal ideation (P= 1.81×10−3, OR=1.14). Conversely, in passive suicidal ideation, a less GPS of general happiness correlated with a greater likelihood of suicidal ideation (P = 1.30×10−3, OR = 0.89).
Of those three GPSs showing significant correlations with suicidality, we further tested an interaction with ELS. We found a significant ASD-GPS-by-ELS interaction on active suicidal ideation: a greater ASD GPS in the presence of ELS correlated with a synergistic increase in these likelihood of suicidality (P=1.39×10−2, OR=1.11; FDR correction for the three analyses). Effects were adjusted for the covariates. No significant associations were found between ELS and suicidality (Ps > 0.24). When stratified by sex, no significant associations were found.
We tested whether the multiple GPSs contributed to prediction of childhood suicidality using machine learning. The following input features were used: the GPSs, socio-demographic (sex, age, marital status, parental education, ethnicity, study site), psychological (Child behavior checklist), cognitive phenotypic (NIH toolbox), family environment (Youth Family Environment Scale), and ELS. In predicting suicidal ideation, the cross-validated and optimized stacked ensemble model showed an ROC-AUC of 0.720, accuracy of 0.756, sensitivity of 0.837, specificity of 0.058, positive predictive value of 0.538, negative predictive value of 0.214 in the balanced held-out test set (N= 476; bootstrap samples) (Figure 2). Adding the GPS to the phenotype model resulted in a 2.4% boost in ROC-AUC. In predicting suicidal attempts, the cross validated and optimized stacked ensemble model showed an ROC-AUC of 0.765, accuracy of 0.750, sensitivity of 0.583, specificity of 0.25, positive predictive value of 0.70, negative predictive value of 0.167 in the balanced held-out test set (N= 32; bootstrap samples) (Figure 2). Adding the GPS to the phenotype model resulted in a 3.2% boost in ROC-AUC. The stacked ensemble models always outperformed the benchmark Gradient Boosting Machine model.
Discussion
We tested the utility of the GPS in identifying children with risk for suicide. Using large-scale, nation-wide, representative children samples, we found novel associations between childhood suicidality and the GPS of ADHD, ASD -- positive associations -- and general happiness -- a negative association. We next found significant gene-by-environment interactions between the GPS of ASD and ELS,6 a known risk factor for childhood suicidality, together acting synergistically on childhood suicidality. In our data-driven predictive modeling, together with the self-reported questionnaires for psychopathology (e.g., CBCL), intelligence, family environment, and socio-demographic variables, inclusion of the multiple GPSs permitted to classify children with suicidality with moderate accuracy. To sum, this study sheds light on the genetic approach to childhood suicidality for better understanding of the etiologic pathway and for prevention and intervention.
Using the multi-GPS method we discovered the association of childhood suicidality with the genetic profiles of psychiatric disorders, i.e., ADHD and ASD, particularly relevant to childhood psychopathology, and of a common trait, i.e., general happiness. For ADHD and ASD, a greater polygenic score correlates with a greater likelihood of suicide, whereas for general happniness, a smaller polygenic score correlates with a greater likelihood of suicide. This is in line with the literature of youth suicide. ADHD in adults has been implicated in completed suicide,37 and in youth in suicidal attempts.38 Our GPS results may provide the genomic evidence of the link between ADHD and childhood suicidality. The explained variance (McFadden’s pseudo-R2)39 of suicidal phenotypes by ADHD GPS was approximately 1.5% in children. This estimation is higher than the results from past PRS studies of suicidality, which showed maximum variance explained by 0.13-0.20% with self-harm behaviors40 or up to 0.30-0.70% of the phenotypic variance for suicide attempt explained by depression-based GPS.41,42
The GPS of ASD not only correlated with suicidal ideation, but also showed a significant interaction with ELS. These results corroborate previous findings of children with ASD being at an increased risk for suicidality.43,44 One study shows that children with ASD are 28 times more likely to exhibit suicidal thoughts or behaviors than that of typical counterparts.45 This strong link between risk of suicidality and elevated autistic traits could be explained by specific behavioral attributes of both phenotypes, such as poor socialization and problem-solving skill, or increased levels of anxiety.44 Although literature has reported several social risk factors underlying the ASD-suicidality link,45,46 the genetic and environmental contribution to the overlap of ASD and suicidal behaviors remained unknown.47 Our results fill this gap by presenting the shared genetic factors for ASD and suicidality, also in environmental (childhood experiences) dimensions. The results thus further suggest that protective, as opposed to adverse, childhood experiences are particularly important for those with genetic risk of ASD. This might be a potential actionable target for intervention in that population.
In line with our findings, previous literature has reported high suicidal risk in individuals who in addition to ASD also have ADHD47. As 40-50% of individuals with ASD are known to have comorbid ADHD,48 our significant results of ASD and ADHD GPSs associated with suicidality suggest the genetic liability of psychiatric comorbidity for high suicidal risk in children.
The association of a less GPS of general happiness (believing that her or his life is meaningful) and a greater likelihood of childhood suicidal ideation is novel. Of several measures of subjective well-being, only the belief of own meaningful life is significantly linked to childhood suicidality. Prior studies report a negative correlation between subjective happiness and suicide.12,13. Our results corroborate the link by showing genetic underpinning of the relationship.
Our study contributes in several ways to our genomic understanding of suicidal phenotypes and provides a basis for utilizing multiple GPSs for suicidal prediction. The combination of the non-suicidal GPSs, behavioral, psychological scales, and ELS allowed accurate prediction of suicidality with a ROC-AUC up to 0.720 for suicidal ideation and 0.765 for suicidal attempts on the balanced test set. Several previous studies have tested different methods for suicidal risk prediction and showed substantial classification ability having AUC ranging from 0.74 to 0.88, using known sociodemographic and psychiatric risk factors.49–52 Although quite successful, some of the models included long-term measurement of risk indices such as 12-month or lifelong risk factors, which are not readily measured in naturalistic clinical practices. Our models showed relatively lower performances than previous studies, but our methodology is promising in that they incorporated short-term or cross-sectional factors that could be accessible and readily measured in clinical settings, on top of the genetic factors. Despite the acceptable accuracy and sensitivity, however, there is much room for improvement regarding specificity and negative predictive value of the model.
Despite the potential utility of the GPS in childhood suicide research discovered here, towards the practical utility in early screening further methodological improvement is needed. Firstly in generating the GPS, searching for the optimal hyper-parameters (e.g.,p-value cutoff or linkage disequilibrium criteria) may lead to less training errors and an improved predictive power. Secondly, future research should find an optimal selection of genetic and environmental risk factors to accurately predict childhood suicidality to prevent suicidal behaviors in youth from the clinical context.
By providing the first quantitative account of the polygenic and environmental factors of childhood suicidality in a large, representative population, this study shows the importance and potential utility of the GPS approach in childhood suicidality in early screening. These findings motivate the future investigation of more effective screening and intervention strategies for children suicidality.
Data Availability
Codes and data is freely available for reproducibility.
https://github.com/Transconnectome/ConnectomeLab/blob/master/suicide
Author contributions
Study concept and design: Y.J., J.C.
Acquisition, analysis, or interpretation of data: Y.J., H.K., S.M., H.W.
Drafting of the manuscript: Y.J., S.M., H.W., J.C.
Critical revision of the manuscript for important intellectual content: Y.J., J.K., I.C., J.C., W.-Y.A.
Statistical analysis: Y.J., S.M., H.W.
Obtained funding: I.C., J.C.
Study supervision: J.C.
Data and Code Availability
Codes and data is freely available for reproducibility (https://github.com/Transconnectome/ConnectomeLab/blob/master/suicide).
Supplementary Materials
Acknowledgement
This work was supported by the New Faculty Startup Fund from Seoul National University, Research grant from the Center for Happiness via the Center for Social Sciences at Seoul National University at Seoul National University, and the H2O.ai Academic Program at H2O inc., CA, USA.