The CADM2 gene and behavior: A phenome-wide scan in UK-Biobank ============================================================== * Joëlle A. Pasman * Zeli Chen * Jacqueline M. Vink * Michel C. Van Den Oever * Tommy Pattij * Taco J. De Vries * Abdel Abdellaoui * Karin J.H. Verweij ## Abstract This phenome-wide association study examined SNP and gene–based associations of the *CADM2* gene with 242 psycho-behavioral traits (N=12,211–453,349). We found significant associations with 51 traits, many more than for other genes. We replicated previously reported associations with substance use, risk-taking, and health behavior, and uncovered novel associations with sleep and dietary traits. Accordingly, *CADM2* is involved in many psycho-behavioral traits, suggesting a common denominator in their biology. In the last 15 years, genome-wide association studies (GWASs) have identified tens of thousands of associations between genetic variants and a range of human behavioral and physical traits. One gene that has popped up surprisingly often in behavioral GWASs is the Cell Adhesion Molecule 2 gene *(CADM2)*. Common variations (single nucleotide polymorphisms, SNPs) in the *CADM2* gene have been implicated in various traits, including personality1, cognition and educational attainment2,3, risk-taking behavior4, reproductive success5, autism spectrum disorders6, substance use7,8, physical activity9, and BMI/obesity10. *CADM2* encodes a member of the synaptic cell adhesion molecules (SynCAMs) involved in synaptic organization and signaling, suggesting that alterations in *CADM2* expression affect neuronal connectivity. *CADM2* is abundant in brain regions important for reward processing and addiction, including the frontal anterior cingulate cortex3, substantia nigra, and insula11. Given its common appearance in GWASs and its central role in brain functioning, *CADM2* is a gene that warrants further exploration. In this study we perform a phenome-wide association analysis (PheWAS), in which we test for associations of *CADM2* (on SNP and gene level) with a comprehensive selection of psycho-behavioral phenotypes as measured in the UK Biobank cohort. Results will provide insights about whether the role of *CADM2* is confined to a specific set of traits or is involved in a wider range of phenotypes. This will inform future studies on the function of *CADM2* and the neurobiological underpinnings of different psycho-behavioral traits. An additional advantage is that the multiple testing burden is reduced as compared to genome-wide studies, resulting in higher power levels. UK Biobank is a nationwide study in the United Kingdom containing phenotypic and genetic information for up to 500,000 individuals12. We analyzed 12,211 to 453,349 UK Biobank participants with European ancestry for whom genetic and phenotypic data were available. About half (54.3%) of the sample was female, and mean age was M=56.8 (range 39-73, SD=8.0). We extracted the *CADM2* region 250 kb up- and downstream (bp 84,758,133 to 86,373,579 on 3p12.1, GRCh37/hg19) and selected 4,265 SNPs with call rate>95%, minor allele frequency >1%, and p-value for violation of Hardy-Weinberg equilibrium of *p**HWE*>10−6 (QC details are described in Abdellaoui, 202013). We selected 242 psychological and behavioral phenotypes from the UK Biobank with an effective sample size above ![Graphic][1]. In order to maximize sample size, we used the first available measurement for each individual; if the first instance was not available, we took the second, otherwise the third, etc. In addition, we included eight traits that were derived for recent genetic studies, including seven substance use traits and educational attainment in years (for an overview of all included traits, see Supplemental Table S1). Continuous phenotypes were cleaned such that theoretical implausible values were set on missing and extreme values more than 4 SDs away from the mean were winsorized at the maximum value of 4SDs. Binary and ordinal variables were left unchanged. Ordinal variables were analyzed as continuous. The SNP-based association analyses were performed in fastGWA14, taking into account genetic relatedness. Analyses were controlled for effects of age, sex, and 25 genetic principal components (PCs, to control for genetic ancestry15). We used linear mixed modeling for all traits and Haseman-Elston regression to estimate the genetic variance component. To test the significance of *CADM2*-associations on gene-level we conducted a MAGMA gene-based test16, which aggregates the SNP effects (regardless of direction) in a single test of association. We used the default SNP-wise mean procedure (averaging SNP effects across the gene) and checked the results of the SNP-wise top procedure for comparison (more sensitive when only a small proportion of SNPs has an effect). As significance threshold for the SNP-based test we adopted a genome-wide significance threshold of *p*<5E-08. As this is rather stringent given that we test within a single gene, we also used a significance threshold of .05 corrected for the number of independent SNPs (n=133, at R2=0.10 and 250kb) and the number of traits, resulting in .05/(133*242)=1.55E-06. For the gene-based test we used a threshold of 2.62E-05, corresponding to .05 divided by the total number of genes (19,082). To provide an estimation of the effect size, we used ![Graphic][2], as described in 17, with adaptations for binary traits as described in8. Given that our analyses were highly powered, we assessed whether the high number of associations discovered for *CADM2* was unusual or similar to those found for other genes. We therefore selected a random set of 50 genes (that were up to 50% smaller or larger in size), repeated the SNP-based analysis for these genes and compared the number of traits with significant associations.. On the SNP-level, 38 traits reached significant associations at a genome-wide corrected p-value (5E-08), and 61 traits at the lenient threshold of *p*<1.55E-06 (Figure 1a, Table 1). In the gene-based test, 51 out of 242 traits showed significant associations under a p-value of 2.62E-06 (Figure 1b, Table 1). The strongest associations were found for cognitive ability, risk taking, diet, BMI, daytime sleeping, sedentary behaviors, nervousness-like traits, reproductive traits, and substance use. There were fewer associations with occupational, traumatic experiences, social connection and non-worry related depression traits. Full SNP- and gene-based results are provided in Table S2 and S3a. Table S3b shows the gene-based results for the SNP-wise top procedure. There were some differences with the SNP-wise mean results, with only 33 significant associations and a correlation of r=.64 between the *p*-values from the respective tests. View this table: [Table 1.](http://medrxiv.org/content/early/2021/04/16/2021.04.16.21255141/T1) Table 1. Phenotypes with a significant association with *CADM2* according to the MAGMA gene-based test (SNP-wise mean) at p<2.62E-06. The top-SNP for the phenotype is given with the minor allele (A1), beta (β), *p*-value (*p*), and percentage of explained variance in the respective trait (R2. (%)). Most top-SNPs were significant at *p*<1.55E-6 (bold-faced). ![Figure 1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/04/16/2021.04.16.21255141/F1.medium.gif) [Figure 1.](http://medrxiv.org/content/early/2021/04/16/2021.04.16.21255141/F1) Figure 1. PheWAS results. Panel **a**) shows the subset of significant associations of the SNP-based test (61 out of 242 traits). The x-axis shows the traits (colored by trait category) and the y-axis the p-values of the association. Each dot represents a SNP association. SNPs exceeding the red horizontal line have a *p*-value significant at a genome-wide threshold of *p*=5E-08. The blue horizontal line represents the suggestive threshold of *p*=1.55E-06. Full SNP-based results are given in Supplementary Figure 1. Panel **b**) shows the subset of significant results of the MAGMA gene-based test (51 out of 242 traits), with *p*-values on the y-axis. The red dotted line represents a threshold of *p*=2.62E-06. The full gene-based results are depicted in Supplementary Figure S2. The SNPs that showed the highest number of significant trait-associations (with a maximum of 30 traits at *p*<1.55E-6, Table S4) clustered around loci at 85.53 and 85.62 Mb. As can be seen in Figure 2, most SNPs that were independently (LD R2<0.01, distance >250kb) significantly associated with at least one trait cluster in the middle of the gene, a region rich in eQTLs. ![Figure 2.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/04/16/2021.04.16.21255141/F2.medium.gif) [Figure 2.](http://medrxiv.org/content/early/2021/04/16/2021.04.16.21255141/F2) Figure 2. The top 100 most significant SNPs for each trait with at least 1 significant SNP. The x-axis represents the base pair position, and the panel below shows information on the *CADM2* transcripts as derived from [https://www.ensembl.org/](https://www.ensembl.org/). As shown in Figure S3, the high number of associations discovered for *CADM2* was exceptional. Most of the 50 comparison genes contained fewer than 5 trait associations, with an average of 2.8 associated traits per gene and a maximum of 13 (as compared to 38 for *CADM2*; Table S6). This PheWAS showed that *CADM2* was involved in a wide spectrum of traits, thereby replicating and extending on previous findings. Interestingly, comparison with 50 other genes showed that this number of trait-associations was exceptionally high, emphasizing the distinctive role of *CADM2* in psycho-behavioral traits. Many of the associations we found have been reported in previous literature (Table S4). However, some specific associations, such as those with specific dietary traits, daytime sleeping, number of live births, and mother’s age at death have not been reported before (to our knowledge). Table S5 summarizes putatively novel SNPs for which no phenotypic associations have been reported before, providing an indication of which may be functionally involved in biological pathways. The variance explained by *CADM2* was highest for lifetime cannabis use, followed by number of children fathered, age at first sexual intercourse, and risk taking. Overall, effect sizes were small (less than 0.05% for cannabis initiation), in range with what is normally found for single variants. Few associations were found in the social interaction, sleep, traumatic experiences, and occupational categories. Also, there were not many mental health traits that showed an association (8 out 53 traits). It is interesting to note the significant associations with worry and nervousness-like traits in the absence of association with (other) depression- and anxiety-related traits. There may be something specific to these seemingly overlapping traits, translating to distinct biological pathways. It needs to be noted that sample sizes for the phenotypes differed substantially (from N=12,211 to 453,349), and as such, it is possible that the pattern of associations was driven in part by differences in power. The correlation between sample size and *p*-value of the gene-based test was moderate and significant, *r*=-.38 (*p*=1.42E-9) showing that well-powered traits were more likely to result in a significant association. It is clear that high power was a requirement: the effect sizes of *CADM2* were diminutive, as is expected for single genes and complex traits. Also, our tests were limited to the psycho-behavioral traits measured in the UK-Biobank; inclusion of more measures, such as longitudinal or non-self-report measures could contribute to a more complete picture. Still, the range of tested traits was quite broad and enabled us to discern interesting patterns. More research is needed to elucidate these links between *CADM2* and this spectrum of psycho-behavioral traits in terms of neurobiological mechanisms. For example, it could be that *CADM2* is important for the learning aspects of behavior, given its role in synaptic connectivity. Speculatively, *CADM2* could then contribute to reward-learning and associative learning, giving rise to risky behavior and substance use 18. This study presents the first comprehensive and rigorous test of associations between *CADM2* and psycho-behavioral traits, showing strong associations for a wide range of traits (many akin to health behavior). Results could be used as starting point for future research into the function of *CADM2*. Research on the trait-associations and function of *CADM2* will further our understanding of the biology of behavior. ## Supporting information Supplementary Tables [[supplements/255141_file05.xlsx]](pending:yes) Supplementary Figures [[supplements/255141_file06.pdf]](pending:yes) ## Data Availability UK Biobank data are available for researchers upon request at UK Biobank. Summary statistics from the current project are provided in the supplementary materials. ## Author contributions ZC and JAP were responsible for the analyses, under supervision of AA and KJHV. JAP, ZC, AA, and KJHV wrote the manuscript. The study was conceived by KJHV and AA. Figures were created by AA. All authors read the manuscript draft and provided input. ## Competing interests statement The authors do not report any competing interests. ## Acknowledgments KJHV and AA are supported by the Foundation Volksbond Rotterdam. AA is supported by ZonMw grant 849200011 from The Netherlands Organisation for Health Research and Development. This project was supported by a grant from Amsterdam Neuroscience (2019). We acknowledge SURFsara for the usage of the Cartesius cluster computer (supported by NWO, EINF-457). This project was conducted under UK-Biobank application 40310. ## Footnotes * * shared first author * ⍰ shared last author * Received April 16, 2021. * Revision received April 16, 2021. * Accepted April 16, 2021. * © 2021, Posted by Cold Spring Harbor Laboratory The copyright holder for this pre-print is the author. All rights reserved. The material may not be redistributed, re-used or adapted without the author's permission. ## References 1. Boutwell, B. et al. Replication and characterization of CADM2 and MSRA genes on human behavior. Heliyon 3, e00349–e00349, doi:10.1016/j.heliyon.2017.e00349 (2017). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.heliyon.2017.e00349&link_type=DOI) 2. Lee, J. J. et al. Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals. Nat. Genet. 50, 1112–1121, doi:10.1038/s41588-018-0147-3 (2018). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41588-018-0147-3&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=30038396&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F04%2F16%2F2021.04.16.21255141.atom) 3. Ibrahim-Verbaas, C. et al. GWAS for executive function and processing speed suggests involvement of the CADM2 gene. Molecular psychiatry 21, 189 (2016). 4. Strawbridge, R. J. et al. Genome-wide analysis of self-reported risk-taking behaviour and cross-disorder genetic correlations in the UK Biobank cohort. 8, 1–11 (2018). 5. Day, F. R. et al. Physical and neurobehavioral determinants of reproductive onset and success. Nat. Genet. 48, 617–623, doi:10.1038/ng.3551 (2016). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.3551&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27089180&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F04%2F16%2F2021.04.16.21255141.atom) 6. Casey, J. P. et al. A novel approach of homozygous haplotype sharing identifies candidate genes in autism spectrum disorder. Hum. Genet. 131, 565–579, doi:10.1007/s00439-011-1094-6 (2012). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/s00439-011-1094-6&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21996756&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F04%2F16%2F2021.04.16.21255141.atom) 7. Liu, M. et al. Association studies of up to 1.2 million individuals yield new insights into the genetic etiology of tobacco and alcohol use. Nat. Genet. 51, 237–244, doi:10.1038/s41588-018-0307-5 (2019). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41588-018-0307-5&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F04%2F16%2F2021.04.16.21255141.atom) 8. Pasman, J. A. et al. GWAS of lifetime cannabis use reveals new risk loci, genetic overlap with psychiatric traits, and a causal influence of schizophrenia. Nat. Neurosci. 21, 1161–1170 (2018). 9. Klimentidis, Y. C. et al. Genome-wide association study of habitual physical activity in over 377,000 UK Biobank participants identifies multiple variants including CADM2 and APOE. Int. J. Obesity 42, 1161–1176, doi:10.1038/s41366-018-0120-3 (2018). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41366-018-0120-3&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F04%2F16%2F2021.04.16.21255141.atom) 10. Locke, A. E. et al. Genetic studies of body mass index yield new insights for obesity biology. Nature 518, 197–206, doi:10.1038/nature14177 (2015). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nature14177&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25673413&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F04%2F16%2F2021.04.16.21255141.atom) 11. Ndiaye, F. K. et al. The expression of genes in top obesity-associated loci is enriched in insula and substantia nigra brain regions involved in addiction and reward. 1–5 (2019). 12. Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209, doi:10.1038/s41586-018-0579-z (2018). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41586-018-0579-z&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=30305743&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F04%2F16%2F2021.04.16.21255141.atom) 13. Abdellaoui, A. Regional differences in reported Covid-19 cases show genetic correlations with higher socio-economic status and better health, potentially confounding studies on the genetics of disease susceptibility. 2020.2004.2024.20075333, doi:10.1101/2020.04.24.20075333 %J medRxiv (2020). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoibWVkcnhpdiI7czo1OiJyZXNpZCI7czoyMToiMjAyMC4wNC4yNC4yMDA3NTMzM3YxIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjEvMDQvMTYvMjAyMS4wNC4xNi4yMTI1NTE0MS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 14. Jiang, L. et al. A resource-efficient tool for mixed model association analysis of large-scale data. Nat. Genet. 51, 1749–1755, doi:10.1038/s41588-019-0530-8 (2019). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41588-019-0530-8&link_type=DOI) 15. Abdellaoui, A. et al. Genetic correlates of social stratification in Great Britain. Nature Human Behaviour 3, 1332–1342, doi:10.1038/s41562-019-0757-5 (2019). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41562-019-0757-5&link_type=DOI) 16. de Leeuw, C. A., Mooij, J. M., Heskes, T. & Posthuma, D. MAGMA: generalized gene-set analysis of GWAS data. PLoS Comp. Biol. 11, e1004219 (2015). 17. Shim, H. et al. A Multivariate Genome-Wide Association Analysis of 10 LDL Subfractions, and Their Response to Statin Treatment, in 1868 Caucasians. PLoS One 10, e0120758, doi:10.1371/journal.pone.0120758 (2015). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pone.0120758&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25898129&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F04%2F16%2F2021.04.16.21255141.atom) 18. Volkow, N. D., Koob, G. F. & McLellan, A. T. Neurobiologic Advances from the Brain Disease Model of Addiction. 374, 363–371, doi:10.1056/NEJMra1511480 (2016). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1056/NEJMra1511480&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26816013&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F04%2F16%2F2021.04.16.21255141.atom) [1]: /embed/inline-graphic-1.gif [2]: /embed/inline-graphic-2.gif