Machine learning for genetic prediction of psychiatric disorders: a systematic review

Bracher-Smith, Matthew; Crawford, Karen; Escott-Price, Valentina

doi:10.1038/s41380-020-0825-2

Review Article
Published: 26 June 2020

Machine learning for genetic prediction of psychiatric disorders: a systematic review

Molecular Psychiatry volume 26, pages 70–79 (2021)Cite this article

3957 Accesses
63 Citations
8 Altmetric
Metrics details

Subjects

This article has been updated

Abstract

Machine learning methods have been employed to make predictions in psychiatry from genotypes, with the potential to bring improved prediction of outcomes in psychiatric genetics; however, their current performance is unclear. We aim to systematically review machine learning methods for predicting psychiatric disorders from genetics alone and evaluate their discrimination, bias and implementation. Medline, PsycInfo, Web of Science and Scopus were searched for terms relating to genetics, psychiatric disorders and machine learning, including neural networks, random forests, support vector machines and boosting, on 10 September 2019. Following PRISMA guidelines, articles were screened for inclusion independently by two authors, extracted, and assessed for risk of bias. Overall, 63 full texts were assessed from a pool of 652 abstracts. Data were extracted for 77 models of schizophrenia, bipolar, autism or anorexia across 13 studies. Performance of machine learning methods was highly varied (0.48–0.95 AUC) and differed between schizophrenia (0.54–0.95 AUC), bipolar (0.48–0.65 AUC), autism (0.52–0.81 AUC) and anorexia (0.62–0.69 AUC). This is likely due to the high risk of bias identified in the study designs and analysis for reported results. Choices for predictor selection, hyperparameter search and validation methodology, and viewing of the test set during training were common causes of high risk of bias in analysis. Key steps in model development and validation were frequently not performed or unreported. Comparison of discrimination across studies was constrained by heterogeneity of predictors, outcome and measurement, in addition to sample overlap within and across studies. Given widespread high risk of bias and the small number of studies identified, it is important to ensure established analysis methods are adopted. We emphasise best practices in methodology and reporting for improving future studies.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Discrimination for all models.**

A primer on the use of machine learning to distil knowledge from data in biological psychiatry

Article 04 January 2024

Thomas P. Quinn, Jonathan L. Hess, … on behalf of the Machine Learning in Psychiatry (MLPsych) Consortium

Comparative performances of machine learning methods for classifying Crohn Disease patients using genome-wide genotyping data

Article Open access 17 July 2019

Alberto Romagnoni, Simon Jégou, … International Inflammatory Bowel Disease Genetics Consortium (IIBDGC)

Inclusion of genetic variants in an ensemble of gradient boosting decision trees does not improve the prediction of citalopram treatment response

Article Open access 12 February 2021

Jason Shumake, Travis T. Mallard, … Christopher G. Beevers

Change history

16 September 2020
Following publication of this article, the authors noticed that the Supplementary Figures were accidentally omitted. The Supplementary Information file has now been updated to include the figures.

References

Glorot X, Bordes A, Bengio Y. Deep sparse rectifier neural networks. J Mach Learn Res. 2011;15:315–23.
Google Scholar
Hinton G, Deng L, Yu D, Dahl G, Mohamed AR, Jaitly N, et al. Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process Mag. 2012;29:82–97.
Google Scholar
Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. Adv Neural Inf Process Syst. 2012;25:1097–105.
Sutskever I, Vinyals O, Le QV. Sequence to sequence learning with neural networks. Adv Neural Inf Process Syst. 2014;27:3104–12.
Cordell HJ. Detecting gene–gene interactions that underlie human diseases. Nat Rev Genet. 2009;10:392–404.
CAS PubMed PubMed Central Google Scholar
Krystal JH, Murray JD, Chekroud AM, Corlett PR, Yang G, Wang X-J, et al. Computational psychiatry and the challenge of Schizophrenia. Schizophr Bull. 2017;43:473–5.
PubMed PubMed Central Google Scholar
Schnack HG. Improving individual predictions: machine learning approaches for detecting and attacking heterogeneity in schizophrenia (and other psychiatric diseases). Schizophr Res. 2019;214:34–42.
PubMed Google Scholar
Tandon N, Tandon R. Will machine learning enable us to finally cut the gordian knot of Schizophrenia. Schizophr Bull. 2018;44:939–41.
PubMed PubMed Central Google Scholar
Christodoulou E, Ma J, Collins GS, Steyerberg EW, Verbakel JY, van Calster B. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J Clin Epidemiol. 2019;110:12–22.
PubMed Google Scholar
Chen X, Ishwaran H. Random forests for genomic data analysis. Genomics 2012;99:323–9.
CAS PubMed Google Scholar
Okser S, Pahikkala T, Aittokallio T. Genetic variants and their interactions in disease risk prediction—machine learning and network perspectives. BioData Min. 2013;6:5.
PubMed PubMed Central Google Scholar
Okser S, Pahikkala T, Airola A, Salakoski T, Ripatti S, Aittokallio T. Regularized machine learning in the genetic prediction of complex traits. PLoS Genet. 2014;10:e1004754.
PubMed PubMed Central Google Scholar
Iniesta R, Stahl D, McGuffin P. Machine learning, statistical learning and the future of biological research in psychiatry. Psychol Med. 2016;46:2455–65.
CAS PubMed PubMed Central Google Scholar
Librenza-Garcia D, Kotzian BJ, Yang J, Mwangi B, Cao B, Pereira Lima LN, et al. The impact of machine learning techniques in the study of bipolar disorder: a systematic review. Neurosci Biobehav Rev. 2017;80:538–54.
PubMed Google Scholar
Lee Y, Ragguett R-M, Mansur RB, Boutilier JJ, Rosenblat JD, Trevizol A, et al. Applications of machine learning algorithms to predict therapeutic outcomes in depression: a meta-analysis and systematic review. J Affect Disord. 2018;241:519–32.
PubMed Google Scholar
Durstewitz D, Koppe G, Meyer-Lindenberg A. Deep neural networks in psychiatry. Mol Psychiatry. 2019;24:1583–98.
PubMed Google Scholar
Ho DSW, Schierding W, Wake M, Saffery R, O’Sullivan J. Machine learning SNP based prediction for precision medicine. Front Genet 2019;10:267.
CAS PubMed PubMed Central Google Scholar
Anttila V, Bulik-Sullivan B, Finucane HK, Walters RK, Bras J, Duncan L, et al. Analysis of shared heritability in common disorders of the brain. Science. 2018;360:eaap8757.
PubMed Google Scholar
Kapur S, Phillips A, Insel T. Why has it taken so long for biological psychiatry to develop clinical tests and what to do about it? Mol Psychiatry. 2012;17:1174–9.
CAS PubMed Google Scholar
Moons KGM, de Groot JAH, Bouwmeester W, Vergouwe Y, Mallett S, Altman DG, et al. Critical appraisal and data extraction for systematic reviews of prediction modelling studies: the CHARMS checklist. PLoS Med. 2014;11:e1001744.
PubMed PubMed Central Google Scholar
Janssens ACJ, Ioannidis JP, van Duijn CM, Little J, Khoury MJ. Strengthening the reporting of genetic risk prediction studies: the GRIPS statement. Genome Med. 2011;3:16.
PubMed PubMed Central Google Scholar
Debray TPA, Damen JAAG, Snell KIE, et al. A guide to systematic review and meta-analysis of prediction model performance. BMJ. 2017;356:i6460.
PubMed Google Scholar
Wolff RF, Moons KGM, Riley RD, Whiting PF, Westwood M, Collins GS, et al. PROBAST: a tool to assess the risk of bias and applicability of prediction model studies. Ann Intern Med. 2019;170:51.
PubMed Google Scholar
Moher D, Liberati A, Tetzlaff J, Altman DG, Group TP. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLoS Med. 2009;6:e1000097.
PubMed PubMed Central Google Scholar
Pirooznia M, Seifuddin F, Judy J, Mahon PB, Potash JB, Zandi PP, et al. Data mining approaches for genome-wide association of mood disorders. Psychiatr Genet. 2012;22:55–61.
PubMed PubMed Central Google Scholar
Guo Y, Wei Z, Keating BJ, Hakonarson H, The Genetic Consortium for Anorexia Nervosa, The Wellcome Trust Case Control Consortium 3, et al. Machine learning derived risk prediction of anorexia nervosa. BMC Med Genomics. 2016;9:4.
PubMed PubMed Central Google Scholar
Vivian-Griffiths T, Baker E, Schmidt KM, Bracher-Smith M, Walters J, Artemiou A, et al. Predictive modeling of schizophrenia from genomic data: comparison of polygenic risk score with kernel support vector machines approach. Am J Med Genet Part B Neuropsychiatr Genet. 2019;180:80–5.
Google Scholar
Power C, Elliott J. Cohort profile: 1958 British birth cohort (National Child Development Study). Int J Epidemiol. 2006;35:34–41.
PubMed Google Scholar
The Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 2007;447:661–78.
PubMed Central Google Scholar
Li C, Yang C, Gelernter J, Zhao H. Improving genetic risk prediction by leveraging pleiotropy. Hum Genet. 2014;133:639–50.
PubMed Google Scholar
Acikel C, Son YA, Celik C, Gul H. Evaluation of potential novel variations and their interactions related to bipolar disorders: analysis of genome-wide association study data. Neuropsychiatr Dis Treat. 2016;12:2997–3004.
CAS PubMed PubMed Central Google Scholar
Chen J, Wu J, Mize T, Shui D, Chen X. Prediction of Schizophrenia diagnosis by integration of genetically correlated conditions and traits. J Neuroimmune Pharmacol. 2018;13:532–40.
PubMed PubMed Central Google Scholar
Trakadis YJ, Sardaar S, Chen A, Fulginiti V, Krishnan A. Machine learning in schizophrenia genomics, a case-control study using 5,090 exomes. Am J Med Genet Part B Neuropsychiatr Genet. 2019;180:103–12.
CAS Google Scholar
Aguiar-Pulido V, Seoane JA, Rabuñal JR, Dorado J, Pazos A, Munteanu CR. Machine learning techniques for single nucleotide polymorphism—disease classification models in schizophrenia. Molecules. 2010;15:4875–89.
CAS PubMed PubMed Central Google Scholar
Yang H, Liu J, Sui J, Pearlson G, Calhoun VD. A hybrid machine learning method for fusing fMRI and genetic data: combining both improves classification of Schizophrenia. Front Hum Neurosci. 2010;4:192.
PubMed PubMed Central Google Scholar
Aguiar-Pulido V, Gestal M, Fernandez-Lozano C, Rivero D, Munteanu CR. Applied computational techniques on Schizophrenia using genetic mutations. Curr Top Med Chem. 2013;13:675–84.
CAS PubMed Google Scholar
Engchuan W, Dhindsa K, Lionel AC, Scherer SW, Chan JH, Merico D. Performance of case-control rare copy number variation annotation in classification of autism. BMC Med Genomics. 2015;8:S7.
PubMed PubMed Central Google Scholar
Laksshman S, Bhat RR, Viswanath V, Li X, Sundaram L, Bhat RR, et al. DeepBipolar: identifying genomic mutations for bipolar disorder via deep learning. Hum Mutat. 2017;38:1217–24.
CAS PubMed Central Google Scholar
Wang D, Liu S, Warrell J, Won H, Shi X, Navarro FCP, et al. Comprehensive functional genomic resource and integrative model for the human brain. Science. 2018;362:eaat8464.
CAS PubMed PubMed Central Google Scholar
Ghafouri-Fard S, Taheri M, Omrani MD, Daaee A, Mohammad-Rahimi H, Kazazi H. Application of single-nucleotide polymorphisms in the diagnosis of autism spectrum disorders: a preliminary study with artificial neural networks. J Mol Neurosci. 2019;68:515–21.
CAS PubMed Google Scholar
Purcell SM, Moran JL, Fromer M, Ruderfer D, Solovieff N, Roussos P, et al. A polygenic burden of rare disruptive mutations in schizophrenia. Nature. 2014;506:185–90.
CAS PubMed PubMed Central Google Scholar
Ripke S, Neale BM, Corvin A, Walters JTR, Farh K-H, Holmans PA, et al. Biological insights from 108 schizophrenia-associated genetic loci. Nature. 2014;511:421–7.
CAS PubMed Central Google Scholar
Daneshjou R, Wang Y, Bromberg Y, Bovo S, Martelli PL, Babbi G, et al. Working toward precision medicine: predicting phenotypes from exomes in the Critical Assessment of Genome Interpretation (CAGI) challenges. Hum Mutat 2017;38:1182–92.
CAS PubMed PubMed Central Google Scholar
Patil S, Habib Awan K, Arakeri G, Jayampath Seneviratne C, Muddur N, Malik S, et al. Machine learning and its potential applications to the genomic study of head and neck cancer—a systematic review. J Oral Pathol Med. 2019;48:773–9.
PubMed Google Scholar
Islam MM, Yang HC, Poly TN, Jian WS, Li YCJ. Deep learning algorithms for detection of diabetic retinopathy in retinal fundus photographs: a systematic review and meta-analysis. Comput Methods Prog Biomed. 2020;191:105320.
Google Scholar
Moons KGM, Kengne AP, Woodward M, Royston P, Vergouwe Y, Altman DG, et al. Risk prediction models: I. Development, internal validation, and assessing the incremental value of a new (bio)marker. Heart. 2012;98:683–90.
PubMed Google Scholar
Biesheuvel CJ, Vergouwe Y, Oudega R, Hoes AW, Grobbee DE, Moons KGM. Advantages of the nested case-control design in diagnostic research. BMC Med Res Methodol. 2008;8:1–7.
Google Scholar
Kallner A. Bayes’ theorem, the roc diagram and reference values: definition and use in clinical diagnosis. Biochem Med. 2018;28:16–25.
Google Scholar
Sun G-W, Shook TL, Kay GL. Inappropriate use of bivariable analysis to screen risk factors for use in multivariable analysis. J Clin Epidemiol. 1996;49:907–16.
CAS PubMed Google Scholar
Vabalas A, Gowen E, Poliakoff E, Casson AJ. Machine learning algorithm validation with a limited sample size. PLoS One. 2019;14:e0224365.
CAS PubMed PubMed Central Google Scholar
Steyerberg EW. Clinical prediction models. 2nd ed. Springer Nature, Switzerland; 2019.
Janssens ACJ, Ioannidis JP, Bedrosian S, Boffetta P, Dolan SM, Dowling N, et al. Strengthening the reporting of genetic risk prediction studies (GRIPS): explanation and elaboration. Eur J Hum Genet. 2011;19:615.
Google Scholar
Bradley AP. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 1997;30:1145–59.
Google Scholar
Wray NR, Yang J, Goddard ME, Visscher PM. The genetic interpretation of area under the ROC curve in genomic profiling. PLoS Genet. 2010;6:e1000864.
PubMed PubMed Central Google Scholar
James G, Witten D, Hastie T, Tibshirani R. An introduction to statistical learning. New York, NY: Springer New York; 2013.
Google Scholar
Bergstra J, Bengio Y. Random search for hyper-parameter optimization. J Mach Learn Res. 2012;13:281–305.
Google Scholar
Ben-Hur A, Weston JA. User’s guide to support vector machines. In: Data mining techniques for the life sciences. Humana Press, New York, NY; 2010. p. 223–39.
Pavlou M, Ambler G, Seaman SR, Guttmann O, Elliott P, King M, et al. How to develop a more accurate risk prediction model when there are few events. BMJ 2015;351:h3868.
PubMed PubMed Central Google Scholar
Steyerberg EW, Harrell FE, Borsboom GJJ, Eijkemans MJ, Vergouwe Y, Habbema JDF. Internal validation of predictive models: efficiency of some procedures for logistic regression analysis. J Clin Epidemiol. 2001;54:774–81.
CAS PubMed Google Scholar
Varma S, Simon R. Bias in error estimation when using cross-validation for model selection. BMC Bioinforma. 2006;7:91.
Google Scholar
Lee SH, Ripke S, Neale BM, Faraone SV, Purcell SM, Perlis RH, et al. Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs. Nat Genet. 2013;45:984–94.
CAS PubMed Google Scholar
Marchini J, Cardon LR, Phillips MS, Donnelly P. The effects of human population structure on large genetic association studies. Nat Genet. 2004;36:512–7.
CAS PubMed Google Scholar
Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet. 2006;38:904–9.
CAS PubMed Google Scholar
Belgard TG, Jankovic I, Lowe JK, Geschwind DH. Population structure confounds autism genetic classifier. Mol Psychiatry. 2014;19:405–7.
CAS PubMed Google Scholar
Martin AR, Gignoux CR, Walters RK, Wojcik GL, Neale BM, Gravel S, et al. Human demographic history impacts genetic risk prediction across diverse populations. Am J Hum Genet. 2017;100:635–49.
CAS PubMed PubMed Central Google Scholar
Bridges M, Heron EA, O’Dushlaine C, Segurado R, Morris D, Corvin A, et al. Genetic classification of populations using supervised learning. PLoS One. 2011;6:e14802.
CAS PubMed PubMed Central Google Scholar
Schrider DR, Kern AD. Supervised machine learning for population genetics: a new paradigm. Trends Genet. 2018;34:301–12.
CAS PubMed PubMed Central Google Scholar
Flagel L, Brandvain Y, Schrider DR. The unreasonable effectiveness of convolutional neural networks in population genetic inference. Mol Biol Evol. 2019;36:220–38.
CAS PubMed Google Scholar
Stephan J, Stegle O, Beyer A. A random forest approach to capture genetic effects in the presence of population structure. Nat Commun. 2015;6:7432.
CAS PubMed Google Scholar
Zhao Y, Chen F, Zhai R, Lin X, Wang Z, Su L, et al. Correction for population stratification in random forest analysis. Int J Epidemiol. 2012;41:1798–806.
PubMed PubMed Central Google Scholar
Zheutlin AB, Chekroud AM, Polimanti R, Gelernter J, Sabb FW, Bilder RM, et al. Multivariate pattern analysis of genotype–phenotype relationships in Schizophrenia. Schizophr Bull. 2018;44:1045–52.
PubMed PubMed Central Google Scholar
Collins GS, Moons KGM. Reporting of artificial intelligence prediction models. Lancet. 2019;393:1577–9.
PubMed Google Scholar
Boulesteix A-L, Wright MN, Hoffmann S, König IR. Statistical learning approaches in the genetic epidemiology of complex diseases. Hum Genet. 2020;139:73–84.
Teschendorff AE. Avoiding common pitfalls in machine learning omic data science. Nat Mater. 2019;18:422–7.
CAS PubMed Google Scholar
Tandon N, Tandon R. Machine learning in psychiatry—standards and guidelines. Asian J Psychiatr. 2019;44:A1–4.
PubMed Google Scholar
Luo W, Phung D, Tran T, Gupta S, Rana S, Karmakar C, et al. Guidelines for developing and reporting machine learning predictive models in biomedical research: a multidisciplinary view. J Med Internet Res. 2016;18:e323.
PubMed PubMed Central Google Scholar
Collins GS, Reitsma JB, Altman DG, Moons KGM. Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): the TRIPOD statement. Ann Intern Med. 2015;162:55.
PubMed Google Scholar

Download references

Acknowledgements

The authors wish to thank the Dementia Research Institute (UKDRI-3003) and MRC Centre for Neuropsychiatric Genetics and Genomics Centre (MR/L010305/1) and Program Grants (MR/P005748/1).

Author information

Authors and Affiliations

MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK
Matthew Bracher-Smith, Karen Crawford & Valentina Escott-Price
Dementia Research Institute, School of Medicine, Cardiff University, Cardiff, UK
Karen Crawford & Valentina Escott-Price

Authors

Matthew Bracher-Smith
View author publications
You can also search for this author in PubMed Google Scholar
Karen Crawford
View author publications
You can also search for this author in PubMed Google Scholar
Valentina Escott-Price
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Valentina Escott-Price.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bracher-Smith, M., Crawford, K. & Escott-Price, V. Machine learning for genetic prediction of psychiatric disorders: a systematic review. Mol Psychiatry 26, 70–79 (2021). https://doi.org/10.1038/s41380-020-0825-2

Download citation

Received: 15 May 2020
Revised: 09 June 2020
Accepted: 16 June 2020
Published: 26 June 2020
Issue Date: January 2021
DOI: https://doi.org/10.1038/s41380-020-0825-2

This article is cited by

Distinct correlation network of clinical characteristics in suicide attempters having adolescent major depressive disorder with non-suicidal self-injury
- Bo Peng
- Ruoxi Wang
- Xin-an Liu
Translational Psychiatry (2024)
Identification of important gene signatures in schizophrenia through feature fusion and genetic algorithm
- Zhixiong Chen
- Ruiquan Ge
- Xiaopeng Fan
Mammalian Genome (2024)
Statistical and Machine Learning Analysis in Brain-Imaging Genetics: A Review of Methods
- Connor L. Cheek
- Peggy Lindner
- Elena L. Grigorenko
Behavior Genetics (2024)
Optimised stacked machine learning algorithms for genomics and genetics disorder detection in the healthcare industry
- Amjad Rehman
- Muhammad Mujahid
- Gwanggil Jeon
Functional & Integrative Genomics (2024)
Identifying patients in need of psychological treatment with language representation models
- İrfan Aygün
- Buket Kaya
- Mehmet Kaya
Multimedia Tools and Applications (2024)

Machine learning for genetic prediction of psychiatric disorders: a systematic review

Subjects

Abstract

Access options

Similar content being viewed by others

A primer on the use of machine learning to distil knowledge from data in biological psychiatry

Comparative performances of machine learning methods for classifying Crohn Disease patients using genome-wide genotyping data

Inclusion of genetic variants in an ensemble of gradient boosting decision trees does not improve the prediction of citalopram treatment response

Change history

16 September 2020

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Supplementary information

Supplementary

Rights and permissions

About this article

Cite this article

This article is cited by

Distinct correlation network of clinical characteristics in suicide attempters having adolescent major depressive disorder with non-suicidal self-injury

Identification of important gene signatures in schizophrenia through feature fusion and genetic algorithm

Statistical and Machine Learning Analysis in Brain-Imaging Genetics: A Review of Methods

Optimised stacked machine learning algorithms for genomics and genetics disorder detection in the healthcare industry

Identifying patients in need of psychological treatment with language representation models

Search

Quick links

Subjects

Abstract

Access options

Similar content being viewed by others

Change history

16 September 2020

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links