PT - JOURNAL ARTICLE AU - Bokan Bao AU - Vahid H. Gazestani AU - Yaqiong Xiao AU - Raphael Kim AU - Austin W.T. Chiang AU - Srinivasa Nalabolu AU - Karen Pierce AU - Kimberly Robasky AU - Nathan E. Lewis AU - Eric Courchesne TI - A Highly Accurate Ensemble Classifier for the Molecular Diagnosis of ASD at Ages 1 to 4 Years AID - 10.1101/2021.07.08.21260225 DP - 2021 Jan 01 TA - medRxiv PG - 2021.07.08.21260225 4099 - http://medrxiv.org/content/early/2021/07/09/2021.07.08.21260225.short 4100 - http://medrxiv.org/content/early/2021/07/09/2021.07.08.21260225.full AB - Importance ASD diagnosis remains behavior-based and the median age of the first diagnosis remains unchanged at ∼52 months, which is nearly 5 years after its first trimester origin. Long delays between ASD’s prenatal onset and eventual diagnosis likely is a missed opportunity. However, accurate and clinically-translatable early-age diagnostic methods do not exist due to ASD genetic and clinical heterogeneity. There is a need for early-age diagnostic biomarkers of ASD that is robust against its heterogeneity.Objective To develop a single blood-based molecular classifier that accurately diagnoses ASD at the age of first symptoms.Design, Setting, and Participants N=264 ASD, typically developing (TD), and language delayed (LD) toddlers with their clinical, diagnostic, and leukocyte RNA data collected. Datasets included Discovery (n=175 ASD, TD subjects), Longitudinal (n=33 ASD, TD subjects), and Replication (n=89 ASD, TD, LD subjects). We developed an ensemble of ASD classifiers by testing 42,840 models composed of 3,570 feature selection sets and 12 classification methods. Models were trained on the Discovery dataset with 5-fold cross validation. Results were used to construct a Bayesian model averaging-based (BMA) ensemble classifier model that was tested in Discovery and Replication datasets. Data were collected from 2007 to 2012 and analyzed from August 2019 to April 2021.Main Outcomes and Measures Primary outcomes were (1) comparisons of the performance of 42,840 classifier models in correctly identifying ASD vs TD and LD in Discovery and Replication datasets; and (2) performance of the ensemble model composed of 1,076 models and weighted by Bayesian model averaging technique.Results Of 42,840 models trained in the Discovery dataset, 1,076 averaged AUC-ROC>0.8. These 1,076 models used 191 different feature routes and 2,764 gene features. Using weighted BMA of these features and routes, an ensemble classifier model was constructed which demonstrated excellent performance in Discovery and Replication datasets with ASD classification AUC-ROC scores of 84% to 88%. ASD classification accuracy was comparable against LD and TD subjects and in the Longitudinal dataset. ASD toddlers with ensemble scores above and below the ASD ensemble mean had similar diagnostic and psychometric scores, but those below the ASD ensemble mean had more prenatal risk events than TD toddlers. Ensemble features include genes with immune/inflammation, response to cytokines, transcriptional regulation, mitotic cell cycle, and PI3K-AKT, RAS, and Wnt signaling pathways.Conclusions and Relevance An ensemble ASD molecular classifier has high and replicable accuracy across the spectrum of ASD clinical characteristics and across toddlers aged 1 to 4 years, which has potential for clinical translation.Question Since ASD is genetically and clinical heterogeneous, can a single blood-based molecular classifier accurately diagnose ASD at the age of first symptoms?Findings To address heterogeneity, we developed an ASD classifier method testing 42,840 models. An ensemble of 1,076 models using 191 different feature routes and 2,764 gene features, weighted by Bayesian model averaging, demonstrated excellent performance in Discovery and Replication datasets producing ASD classification with the area under the receiver operating characteristic curve (AUC-ROC) scores of 84% to 88%. Features include genes with immune/inflammation, response to cytokines, transcriptional regulation, mitotic cell cycle, and PI3K-AKT, RAS and Wnt signaling pathways.Meaning An ensemble gene expression ASD classifier has high accuracy across the spectrum of ASD clinical characteristics and across toddlers aged 1 to 4 years.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis work was supported by NIMH grant no. R01-MH110558 (E.C., N.E.L.), NIMH grant no. R01-MH080134 (K.P.), NIMH grant no. R01-MH104446 (K.P.), an NFAR grant (K.P.), NIMH grant no. P50-MH081755 (E.C.), and generous funding from the Novo Nordisk Foundation through the Center for Biosustainability at the Technical University of Denmark (grant no. NNF10CC1016517 to N.E.L.).Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:UC San Diego IRB Project #202115X Discovering Biomarkers, Causes and Treatment of ASD through Clinical and Biological StudiesAll necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesAll data have been archived at the NIMH NDA