TY - JOUR T1 - Use of unstructured text in prognostic clinical prediction models: a systematic review JF - medRxiv DO - 10.1101/2022.01.17.22269400 SP - 2022.01.17.22269400 AU - Tom M. Seinen AU - Egill Fridgeirsson AU - Solomon Ioannou AU - Daniel Jeannetot AU - Luis H. John AU - Jan A. Kors AU - Aniek F. Markus AU - Victor Pera AU - Alexandros Rekkas AU - Ross D. Williams AU - Cynthia Yang AU - Erik van Mulligen AU - Peter R. Rijnbeek Y1 - 2022/01/01 UR - http://medrxiv.org/content/early/2022/01/18/2022.01.17.22269400.abstract N2 - Objective This systematic review aims to assess how information from unstructured clinical text is used to develop and validate prognostic risk prediction models. We summarize the prediction problems and methodological landscape and assess whether using unstructured clinical text data in addition to more commonly used structured data improves the prediction performance.Materials and Methods We searched Embase, MEDLINE, Web of Science, and Google Scholar to identify studies that developed prognostic risk prediction models using unstructured clinical text data published in the period from January 2005 to March 2021. Data items were extracted, analyzed, and a meta-analysis of the model performance was carried out to assess the added value of text to structured-data models.Results We identified 126 studies that described 145 clinical prediction problems. Combining text and structured data improved model performance, compared to using only text or only structured data. In these studies, a wide variety of dense and sparse numeric text representations were combined with both deep learning and more traditional machine learning methods. External validation, public availability, and explainability of the developed models was limited.Conclusion Overall, the use of unstructured clinical text data in the development of prognostic prediction models has been found beneficial in addition to structured data in most studies. The EHR text data is a source of valuable information for prediction model development and should not be neglected. We suggest a future focus on explainability and external validation of the developed models, promoting robust and trustworthy prediction models in clinical practice.Competing Interest StatementThe authors have declared no competing interest.Clinical Protocols https://osf.io/gw628 Funding StatementThis work has received support from the European Health Data and Evidence Network (EHDEN) project. EHDEN has received funding from the Innovative Medicines Initiative 2 Joint Undertaking (JU) under grant agreement No 806968. The JU receives support from the European Union's Horizon 2020 research and innovation programme and EFPIA.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesI confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesThe data underlying this work are available as supplementary material. ER -