Identification and prediction of ALS subgroups using machine learning

Faraz Faghri; Fabian Brunn; Anant Dadu; PARALS; ERRALS; Elisabetta Zucchi; Ilaria Martinelli; Letizia Mazzini; Rosario Vasta; Antonio Canosa; Cristina Moglia; Andrea Calvo; Michael A. Nalls; Roy H. Campbell; Jessica Mandrioli; Bryan J. Traynor; Adriano Chiò

doi:10.1101/2021.04.02.21254844

SUMMARY

Background The disease entity known as amyotrophic lateral sclerosis (ALS) is now known to represent a collection of overlapping syndromes. A better understanding of this heterogeneity and the ability to distinguish ALS subtypes would improve the clinical care of patients and enhance our understanding of the disease. Subtype profiles could be incorporated into the clinical trial design to improve our ability to detect a therapeutic effect. A variety of classification systems have been proposed over the years based on empirical observations, but it is unclear to what extent they genuinely reflect ALS population substructure.

Methods We applied machine learning algorithms to a prospective, population-based cohort consisting of 2,858 Italian patients diagnosed with ALS for whom detailed clinical phenotype data were available. We replicated our findings in an independent population-based cohort of 1,097 Italian ALS patients.

Findings We found that semi-supervised machine learning based on UMAP applied to the output of a multi-layered perceptron neural network produced the optimum clustering of the ALS patients in the discovery cohort. These clusters roughly corresponded to the six clinical subtypes defined by the Chiò classification system (bulbar ALS, respiratory ALS, flail arm ALS, classical ALS, pyramidal ALS, and flail leg ALS). The same clusters were identified in the replication cohort. A supervised learning approach based on ensemble learning identified twelve clinical parameters that predicted ALS clinical subtype with high accuracy (area under the curve = 0·94).

Interpretation Our data-driven study provides insight into the ALS population’s substructure and demonstrates that the Chiò classification system robustly identifies ALS subtypes. We provide an interactive website (https://share.streamlit.io/anant-dadu/machinelearningforals/main) so that clinical researchers can predict the clinical subtype of an ALS patient based on a small number of clinical parameters.

Funding National Institute on Aging and the Italian Ministry of Health.

Evidence before this study We searched PubMed for articles published in English from database inception until January 5, 2021, about the use of machine learning and the identification of clinical subtypes within the amyotrophic lateral sclerosis (ALS) population, using the search terms “machine learning”, AND “classification”, AND “amyotrophic lateral sclerosis”. This inquiry identified twenty-nine studies. Most previous studies used machine learning to diagnose ALS (based on gait, imaging, electromyography, gene expression, proteomic, and metabolomic data) or improve brain-computer interfaces. One study used machine learning algorithms to stratify ALS postmortem cortex samples into molecular subtypes based on transcriptome data. Kueffner and colleagues crowdsourced the development of machine learning algorithms to approximately thirty teams to obtain a consensus in an attempt to identify ALS patients subpopulation. In addition to clinical trial information in the PRO-ACT database (www.ALSdatabase.org), this effort used data from the Piedmont and Valle d’Aosta Registry for ALS (PARALS). Four ALS patient categories were identified: slow progressing, fast progressing, early stage, and late stage. This approach’s clinical relevance was unclear, as all ALS patients will necessarily pass through an early and late stage of the disease.

Furthermore, no attempt was made to discern which of the existing clinical classification systems, such as the El Escorial criteria, the Chiò classification system, and the King’s clinical staging system, can identify ALS subtypes. We concluded that there remained an unmet need to identify the ALS population’s substructure in a data-driven, non-empirical manner. Building on this, there was a need for a tool that reliably predicts the clinical subtype of an ALS patient. This knowledge would improve our understanding of the clinical heterogeneity associated with this fatal neurodegenerative disease.

Added value of this study This study developed a machine learning algorithm to detect ALS patients’ clinical subtypes using clinical data collected from the 2,858 Italian ALS patients in PARALS. Ascertainment of these patients within the catchment area was near complete, meaning that the dataset truly represented the ALS population. We replicated our approach using clinical data obtained from an independent cohort of 1,097 Italian ALS patients that had also been collected in a population-based, longitudinal manner. Semi-supervised learning based on Uniform Manifold Approximation and Projection (UMAP) applied to a multilayer perceptron neural network provided the optimum results based on visual inspection. The observed clusters equated to the six clinical subtypes previously defined by the Chiò classification system (bulbar ALS, respiratory ALS, flail arm ALS, classical ALS, pyramidal ALS, and flail leg ALS). Using a small number of clinical parameters, an ensemble learning approach could predict the ALS clinical subtype with high accuracy (area under the curve = 0·94).

Implications of all available evidence Additional validation is required to determine these algorithms’ accuracy and clinical utility in assigning clinical subtypes. Nevertheless, our algorithms offer a broad insight into the clinical heterogeneity of ALS and help to determine the actual subtypes of disease that exist within this fatal neurodegenerative syndrome. The systematic identification of ALS subtypes will improve clinical care and clinical trial design.

Competing Interest Statement

BJT holds patents on the clinical testing and therapeutic intervention for the hexanucleotide repeat expansion of C9orf72 and has received research grants from The Myasthenia Gravis Foundation, the Robert Packard Center for ALS Research, the ALS Association (ALSA), the Italian Football Federation (FIGC), the Center for Disease Control and Prevention (CDC), the Muscular Dystrophy Association (MDA), Merck, and Microsoft Research. BJT receives funding through the Intramural Research Program at the National Institutes of Health. JM has received research grants from the Fondazione Italiana di Ricerca per la Sclerosi Laterale Amiotrofica (ARISLA), the Agenzia Italiana del Farmaco (AIFA), the Italian Ministry of Health, the Emilia Romagna Regional Health Authority, and Pfizer.

Funding Statement

This work was supported in part by the Intramural Research Programs of the NIH, National Institute on Aging (Z01-AG000949-02). This work was in part supported by the Italian Ministry of Health (Ministero della Salute, Ricerca Sanitaria Finalizzata, grant RF-2016-02362405), the European Commission's Health Seventh Framework Programme (FP7/2007-2013 under grant agreement 259867), and the Joint Programme - Neurodegenerative Disease Research (Strength, ALS-Care and Brain-Mend projects), granted by Italian Ministry of Education, University and Research. This study was performed under the Department of Excellence grant of the Italian Ministry of Education, University and Research to the 'Rita Levi Montalcini' Department of Neuroscience, University of Torino, Italy. The Emilia Romagna Registry for ALS (ERRALS) is supported by a Grant from the Emilia Romagna Regional Health Authority. This study was also partly funded by the AGING Project for Department of Excellence at the Department of Translational Medicine (DIMET), Università del Piemonte Orientale, Novara, Italy. This study used the high-performance computational capabilities of the Biowulf Linux cluster at the National Institutes of Health, Bethesda, Maryland, USA (http://biowulf.nih.gov). We thank the Laboratory of Neurogenetics (NIH) staff for their collegial support and technical assistance.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

Informed written consent was obtained from the participants. The studies were approved by the ethics committees of Azienda Ospedaliera Universitaria City of Health and Science of Turin and Azienda Ospedaliero Universitaria of Modena. All records were anonymized according to the Italian code for the protection of personal data and data were treated following the UE 2016/679 General Data Protection Regulation (GDPR).

All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.

Yes

Data Availability

To facilitate replication and expansion of our work, we have made the notebook publicly available under the GPLv3 license on Google Colaboratory and GitHub at https://github.com/ffaghri1/ALS-ML. This code includes the rendered Jupyter notebook with a full step-by-step description of the data pre-processing, statistical, and machine learning analysis used in this study. We have also developed an interactive website ((https://share.streamlit.io/anant-dadu/machinelearningforals/main) as open access and cloud-based platform that allows clinical researchers to determine an ALS patient's future clinical subtype based on a small number of clinical parameters.

https://github.com/ffaghri1/ALS-ML

https://share.streamlit.io/anant-dadu/machinelearningforals/main

The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license.