Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Identification and prediction of ALS subgroups using machine learning

View ORCID ProfileFaraz Faghri, Fabian Brunn, Anant Dadu, PARALS, ERRALS, Elisabetta Zucchi, Ilaria Martinelli, Letizia Mazzini, Rosario Vasta, Antonio Canosa, Cristina Moglia, Andrea Calvo, Michael A. Nalls, Roy H. Campbell, Jessica Mandrioli, Bryan J. Traynor, View ORCID ProfileAdriano Chiò
doi: https://doi.org/10.1101/2021.04.02.21254844
Faraz Faghri
1Neuromuscular Diseases Research Section, Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD 20892, USA
2Center for Alzheimer’s and Related Dementias, National Institute on Aging, Bethesda, MD, 20892, USA
3Data Tecnica International, Glen Echo, MD, 20812, USA
4Department of Computer Science, University of Illinois at Urbana–Champaign, Champaign, IL 61801, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Faraz Faghri
Fabian Brunn
4Department of Computer Science, University of Illinois at Urbana–Champaign, Champaign, IL 61801, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Anant Dadu
4Department of Computer Science, University of Illinois at Urbana–Champaign, Champaign, IL 61801, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Elisabetta Zucchi
5Department of Biomedical, Metabolic and Neural Sciences, University of Modena and Reggio Emilia, 41124 Modena, Italy
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ilaria Martinelli
6Neurology Unit, Department of Neurosciences, Azienda Ospedaliero Universitaria di Modena, Modena 41125, Italy
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Letizia Mazzini
7ALS Center, Department of Neurology, Maggiore della Carità University Hospital, Novara 28100 Italy
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Rosario Vasta
8‘Rita Levi Montalcini’ Department of Neuroscience, University of Turin, Turin 10126, Italy
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Antonio Canosa
8‘Rita Levi Montalcini’ Department of Neuroscience, University of Turin, Turin 10126, Italy
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Cristina Moglia
8‘Rita Levi Montalcini’ Department of Neuroscience, University of Turin, Turin 10126, Italy
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Andrea Calvo
8‘Rita Levi Montalcini’ Department of Neuroscience, University of Turin, Turin 10126, Italy
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Michael A. Nalls
2Center for Alzheimer’s and Related Dementias, National Institute on Aging, Bethesda, MD, 20892, USA
3Data Tecnica International, Glen Echo, MD, 20812, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Roy H. Campbell
4Department of Computer Science, University of Illinois at Urbana–Champaign, Champaign, IL 61801, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jessica Mandrioli
5Department of Biomedical, Metabolic and Neural Sciences, University of Modena and Reggio Emilia, 41124 Modena, Italy
6Neurology Unit, Department of Neurosciences, Azienda Ospedaliero Universitaria di Modena, Modena 41125, Italy
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Bryan J. Traynor
1Neuromuscular Diseases Research Section, Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD 20892, USA
9Department of Neurology, Johns Hopkins University Medical Center, Baltimore, MD 21287, USA
10Reta Lila Weston Institute, UCL Queen Square Institute of Neurology, University College London, London WC1N 1PJ, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: traynorb@mail.nih.gov
Adriano Chiò
8‘Rita Levi Montalcini’ Department of Neuroscience, University of Turin, Turin 10126, Italy
11Institute of Cognitive Sciences and Technologies, C.N.R., Rome 00185, Italy
12Neurology 1 and ALS Center, Azienda Ospedaliero Universitaria Città della Salute e della Scienza, Turin 10126, Italy
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Adriano Chiò
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Data/Code
  • Preview PDF
Loading

SUMMARY

Background The disease entity known as amyotrophic lateral sclerosis (ALS) is now known to represent a collection of overlapping syndromes. A better understanding of this heterogeneity and the ability to distinguish ALS subtypes would improve the clinical care of patients and enhance our understanding of the disease. Subtype profiles could be incorporated into the clinical trial design to improve our ability to detect a therapeutic effect. A variety of classification systems have been proposed over the years based on empirical observations, but it is unclear to what extent they genuinely reflect ALS population substructure.

Methods We applied machine learning algorithms to a prospective, population-based cohort consisting of 2,858 Italian patients diagnosed with ALS for whom detailed clinical phenotype data were available. We replicated our findings in an independent population-based cohort of 1,097 Italian ALS patients.

Findings We found that semi-supervised machine learning based on UMAP applied to the output of a multi-layered perceptron neural network produced the optimum clustering of the ALS patients in the discovery cohort. These clusters roughly corresponded to the six clinical subtypes defined by the Chiò classification system (bulbar ALS, respiratory ALS, flail arm ALS, classical ALS, pyramidal ALS, and flail leg ALS). The same clusters were identified in the replication cohort. A supervised learning approach based on ensemble learning identified twelve clinical parameters that predicted ALS clinical subtype with high accuracy (area under the curve = 0·94).

Interpretation Our data-driven study provides insight into the ALS population’s substructure and demonstrates that the Chiò classification system robustly identifies ALS subtypes. We provide an interactive website (https://share.streamlit.io/anant-dadu/machinelearningforals/main) so that clinical researchers can predict the clinical subtype of an ALS patient based on a small number of clinical parameters.

Funding National Institute on Aging and the Italian Ministry of Health.

Evidence before this study We searched PubMed for articles published in English from database inception until January 5, 2021, about the use of machine learning and the identification of clinical subtypes within the amyotrophic lateral sclerosis (ALS) population, using the search terms “machine learning”, AND “classification”, AND “amyotrophic lateral sclerosis”. This inquiry identified twenty-nine studies. Most previous studies used machine learning to diagnose ALS (based on gait, imaging, electromyography, gene expression, proteomic, and metabolomic data) or improve brain-computer interfaces. One study used machine learning algorithms to stratify ALS postmortem cortex samples into molecular subtypes based on transcriptome data. Kueffner and colleagues crowdsourced the development of machine learning algorithms to approximately thirty teams to obtain a consensus in an attempt to identify ALS patients subpopulation. In addition to clinical trial information in the PRO-ACT database (www.ALSdatabase.org), this effort used data from the Piedmont and Valle d’Aosta Registry for ALS (PARALS). Four ALS patient categories were identified: slow progressing, fast progressing, early stage, and late stage. This approach’s clinical relevance was unclear, as all ALS patients will necessarily pass through an early and late stage of the disease.

Furthermore, no attempt was made to discern which of the existing clinical classification systems, such as the El Escorial criteria, the Chiò classification system, and the King’s clinical staging system, can identify ALS subtypes. We concluded that there remained an unmet need to identify the ALS population’s substructure in a data-driven, non-empirical manner. Building on this, there was a need for a tool that reliably predicts the clinical subtype of an ALS patient. This knowledge would improve our understanding of the clinical heterogeneity associated with this fatal neurodegenerative disease.

Added value of this study This study developed a machine learning algorithm to detect ALS patients’ clinical subtypes using clinical data collected from the 2,858 Italian ALS patients in PARALS. Ascertainment of these patients within the catchment area was near complete, meaning that the dataset truly represented the ALS population. We replicated our approach using clinical data obtained from an independent cohort of 1,097 Italian ALS patients that had also been collected in a population-based, longitudinal manner. Semi-supervised learning based on Uniform Manifold Approximation and Projection (UMAP) applied to a multilayer perceptron neural network provided the optimum results based on visual inspection. The observed clusters equated to the six clinical subtypes previously defined by the Chiò classification system (bulbar ALS, respiratory ALS, flail arm ALS, classical ALS, pyramidal ALS, and flail leg ALS). Using a small number of clinical parameters, an ensemble learning approach could predict the ALS clinical subtype with high accuracy (area under the curve = 0·94).

Implications of all available evidence Additional validation is required to determine these algorithms’ accuracy and clinical utility in assigning clinical subtypes. Nevertheless, our algorithms offer a broad insight into the clinical heterogeneity of ALS and help to determine the actual subtypes of disease that exist within this fatal neurodegenerative syndrome. The systematic identification of ALS subtypes will improve clinical care and clinical trial design.

Competing Interest Statement

BJT holds patents on the clinical testing and therapeutic intervention for the hexanucleotide repeat expansion of C9orf72 and has received research grants from The Myasthenia Gravis Foundation, the Robert Packard Center for ALS Research, the ALS Association (ALSA), the Italian Football Federation (FIGC), the Center for Disease Control and Prevention (CDC), the Muscular Dystrophy Association (MDA), Merck, and Microsoft Research. BJT receives funding through the Intramural Research Program at the National Institutes of Health. JM has received research grants from the Fondazione Italiana di Ricerca per la Sclerosi Laterale Amiotrofica (ARISLA), the Agenzia Italiana del Farmaco (AIFA), the Italian Ministry of Health, the Emilia Romagna Regional Health Authority, and Pfizer.

Funding Statement

This work was supported in part by the Intramural Research Programs of the NIH, National Institute on Aging (Z01-AG000949-02). This work was in part supported by the Italian Ministry of Health (Ministero della Salute, Ricerca Sanitaria Finalizzata, grant RF-2016-02362405), the European Commission's Health Seventh Framework Programme (FP7/2007-2013 under grant agreement 259867), and the Joint Programme - Neurodegenerative Disease Research (Strength, ALS-Care and Brain-Mend projects), granted by Italian Ministry of Education, University and Research. This study was performed under the Department of Excellence grant of the Italian Ministry of Education, University and Research to the 'Rita Levi Montalcini' Department of Neuroscience, University of Torino, Italy. The Emilia Romagna Registry for ALS (ERRALS) is supported by a Grant from the Emilia Romagna Regional Health Authority. This study was also partly funded by the AGING Project for Department of Excellence at the Department of Translational Medicine (DIMET), Università del Piemonte Orientale, Novara, Italy. This study used the high-performance computational capabilities of the Biowulf Linux cluster at the National Institutes of Health, Bethesda, Maryland, USA (http://biowulf.nih.gov). We thank the Laboratory of Neurogenetics (NIH) staff for their collegial support and technical assistance.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

Informed written consent was obtained from the participants. The studies were approved by the ethics committees of Azienda Ospedaliera Universitaria City of Health and Science of Turin and Azienda Ospedaliero Universitaria of Modena. All records were anonymized according to the Italian code for the protection of personal data and data were treated following the UE 2016/679 General Data Protection Regulation (GDPR).

All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.

Yes

Data Availability

To facilitate replication and expansion of our work, we have made the notebook publicly available under the GPLv3 license on Google Colaboratory and GitHub at https://github.com/ffaghri1/ALS-ML. This code includes the rendered Jupyter notebook with a full step-by-step description of the data pre-processing, statistical, and machine learning analysis used in this study. We have also developed an interactive website ((https://share.streamlit.io/anant-dadu/machinelearningforals/main) as open access and cloud-based platform that allows clinical researchers to determine an ALS patient's future clinical subtype based on a small number of clinical parameters.

https://github.com/ffaghri1/ALS-ML

https://share.streamlit.io/anant-dadu/machinelearningforals/main

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license.
Back to top
PreviousNext
Posted April 07, 2021.
Download PDF
Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Identification and prediction of ALS subgroups using machine learning
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Identification and prediction of ALS subgroups using machine learning
Faraz Faghri, Fabian Brunn, Anant Dadu, PARALS, ERRALS, Elisabetta Zucchi, Ilaria Martinelli, Letizia Mazzini, Rosario Vasta, Antonio Canosa, Cristina Moglia, Andrea Calvo, Michael A. Nalls, Roy H. Campbell, Jessica Mandrioli, Bryan J. Traynor, Adriano Chiò
medRxiv 2021.04.02.21254844; doi: https://doi.org/10.1101/2021.04.02.21254844
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
Identification and prediction of ALS subgroups using machine learning
Faraz Faghri, Fabian Brunn, Anant Dadu, PARALS, ERRALS, Elisabetta Zucchi, Ilaria Martinelli, Letizia Mazzini, Rosario Vasta, Antonio Canosa, Cristina Moglia, Andrea Calvo, Michael A. Nalls, Roy H. Campbell, Jessica Mandrioli, Bryan J. Traynor, Adriano Chiò
medRxiv 2021.04.02.21254844; doi: https://doi.org/10.1101/2021.04.02.21254844

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Neurology
Subject Areas
All Articles
  • Addiction Medicine (216)
  • Allergy and Immunology (495)
  • Anesthesia (106)
  • Cardiovascular Medicine (1101)
  • Dentistry and Oral Medicine (196)
  • Dermatology (141)
  • Emergency Medicine (274)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (502)
  • Epidemiology (9785)
  • Forensic Medicine (5)
  • Gastroenterology (481)
  • Genetic and Genomic Medicine (2319)
  • Geriatric Medicine (223)
  • Health Economics (463)
  • Health Informatics (1563)
  • Health Policy (737)
  • Health Systems and Quality Improvement (606)
  • Hematology (238)
  • HIV/AIDS (507)
  • Infectious Diseases (except HIV/AIDS) (11657)
  • Intensive Care and Critical Care Medicine (617)
  • Medical Education (240)
  • Medical Ethics (67)
  • Nephrology (258)
  • Neurology (2149)
  • Nursing (134)
  • Nutrition (338)
  • Obstetrics and Gynecology (427)
  • Occupational and Environmental Health (518)
  • Oncology (1183)
  • Ophthalmology (366)
  • Orthopedics (129)
  • Otolaryngology (220)
  • Pain Medicine (148)
  • Palliative Medicine (50)
  • Pathology (313)
  • Pediatrics (698)
  • Pharmacology and Therapeutics (302)
  • Primary Care Research (267)
  • Psychiatry and Clinical Psychology (2188)
  • Public and Global Health (4674)
  • Radiology and Imaging (781)
  • Rehabilitation Medicine and Physical Therapy (457)
  • Respiratory Medicine (624)
  • Rheumatology (274)
  • Sexual and Reproductive Health (226)
  • Sports Medicine (210)
  • Surgery (252)
  • Toxicology (43)
  • Transplantation (120)
  • Urology (94)