TY - JOUR T1 - Early Prediction of Alzheimer’s Disease and Related Dementias Using Electronic Health Records JF - medRxiv DO - 10.1101/2020.06.13.20130401 SP - 2020.06.13.20130401 AU - Xi Yang AU - Qian Li AU - Yonghui Wu AU - Jiang Bian AU - Tianchen Lyu AU - Yi Guo AU - David Marra AU - Amber Miller AU - Elizabeth Shenkman AU - Demetrius Maraganore Y1 - 2020/01/01 UR - http://medrxiv.org/content/early/2020/06/16/2020.06.13.20130401.abstract N2 - Alzheimer’s disease (AD) and AD-related dementias (ADRD) are a class of neurodegenerative diseases affecting about 5.7 million Americans. There is no cure for AD/ADRD. Current interventions have modest effects and focus on attenuating cognitive impairment. Detection of patients at high risk of AD/ADRD is crucial for timely interventions to modify risk factors and primarily prevent cognitive decline and dementia, and thus to enhance the quality of life and reduce health care costs. This study seeks to investigate both knowledge-driven (where domain experts identify useful features) and data-driven (where machine learning models select useful features among all available data elements) approaches for AD/ADRD early prediction using real-world electronic health records (EHR) data from the University of Florida (UF) Health system. We identified a cohort of 59,799 patients and examined four widely used machine learning algorithms following a standard case-control study. We also examined the early prediction of AD/ADRD using patient information 0-years, 1-year, 3-years, and 5-years before the disease onset date. The experimental results showed that models based on the Gradient Boosting Trees (GBT) achieved the best performance for the data-driven approach and the Random Forests (RF) achieved the best performance for the knowledge-driven approach. Among all models, GBT using a data-driven approach achieved the best area under the curve (AUC) score of 0.7976, 0.7192, 0.6985, and 0.6798 for 0, 1, 3, 5-years prediction, respectively. We also examined the top features identified by the machine learning models and compared them with the knowledge-driven features identified by domain experts. Our study demonstrated the feasibility of using electronic health records for the early prediction of AD/ADRD and discovered potential challenges for future investigations.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis study was partially supported by an Ed and Ethel Moore Alzheimer's Disease Research Program from the Florida Department of Health (FL DOH #9AZ14) and a Patient-Centered Outcomes Research Institute (PCORI) Award (ME-2018C3-14754). The content is solely the responsibility of the authors and does not necessarily represent the official views of the funding institutions.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:The University of Florida IRB (IRB201900182) had approved this study and assigned to the exempt category.All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesData sharing is not applicable to this article. ER -