TY - JOUR T1 - Predicting morbidity by Local Similarities in Multi-Scale Patient Trajectories JF - medRxiv DO - 10.1101/2020.09.14.20194464 SP - 2020.09.14.20194464 AU - Lucía A Carrasco-Ribelles AU - Jose Ramón Pardo-Mas AU - Salvador Tortajada AU - Carlos Sáez AU - Bernardo Valdivieso AU - Juan M García-Gómez Y1 - 2020/01/01 UR - http://medrxiv.org/content/early/2020/09/15/2020.09.14.20194464.abstract N2 - Healthcare predictive models generally rely on static snapshots of patient information. Patient Trajectories (PTs) model the evolution of patient conditions over time and are a promising source of information for predicting future morbidities. However, PTs are highly heterogeneous among patients in terms of length and content, so only aggregated versions that include the most frequent events have been studied. Further, the use of longitudinal multiscale data such as integrating EHR coded data and laboratory results in PT models is yet to be explored. Our hypothesis is that local similarities on small chunks of PTs can identify similar patients with respect to their future morbidities. The objectives of this work are (1) to develop a methodology to identify local similarities between PTs prior to the occurrence of morbidities to predict these on new query individuals; and (2) to validate this methodology to impute risk of cardiovascular diseases (CVD) in patients with diabetes.We have proposed a novel formal definition of PTs based on sequences of multi-scale data over time, so each patient has their own PT including every data available in their EHR. Thus, patients do not need to follow partly or completely one pre-defined trajectory built by the most frequent events in a population but having common events with any another patient. A dynamic programming methodology to identify local alignments on PTs for predicting future morbidities is proposed. The proposed methodology for PT definition and the alignment algorithm are generic to be applied on any additional clinical domain. We tested this solution for predicting CVD in patients with diabetes and we achieved a positive predictive value of 0.33, a recall of 0.72 and a specificity of 0.38. Therefore, the proposed solution in the diabetes use case can result of utmost utility to patient screening.HighlightsLocal similarities between patient trajectories can potentially be used to predict morbid conditions.A formal definition of patient trajectories comprising heterogeneous clinical observations, biomedical tests and time gaps is proposed.A novel dynamic programming methodology is proposed to find similar patients based on the Smith-Waterman alignment algorithm and a set of customized scoring matrices.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis work was supported by the CrowdHealth project (COLLECTIVE WISDOM DRIVING PUBLIC HEALTH POLICIES (727560)) and the MTS4up project (DPI2016-80054-R).Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:Approved by the Ethical Committee of Hospital Universitario y Politecnico La Fe under the Project Modelos y tecnicas de simulación para identificar factores asociados a la diabetes presented by Dr. Bernardo Valdivieso with code: 2015/0458.All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesThe data is not available to the public. ER -