Abstract
The prediction of mortality of critically ill patients has stimulated the development of many severity scoring algorithms. The majority of the models use physiological measurements obtained during the first hours of admission (i.e., heart rate, arterial blood pressure, or respiratory rate). In this study, we propose to improve the performance of current scoring system by including free text from patient’s medical history. Although the primary outcome was in-hospital mortality, we chose a model architecture to provide simultaneous assessment of ICD-9 codes and groupings. We hypothesized that including patients’ medical history with a multitask learning approach would improve model performance. We compared the predictive performance obtained with our approach to the best models previously proposed in the literature (baseline models). We used the MIMIC publicly available database which includes > 60,000 ICU admissions between 2001 and 2012. The patients’ condition at admission was accounted for by the preliminary diagnosis at admission and the medical history extracted from the discharge summaries notes. Unstructured data was processed through a Gated Recurrent Units layer with pre-trained word embeddings, and the hidden states were concatenated to the remaining structured-tabular data. Baseline models achieved similar results than in previously published work, but our artificial neural networks models showed significant improvement towards classification of mortality (AUC-ROC = 0.90). Including the medical history improved all tasks but relatively more the ICD-9 codes prediction than the mortality. The clinical prediction model presented here could be used to identify patients’ risk groups, which would improve the quality of ICU care, and further help to efficiently allocate hospital resources.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
Lara Reichmann had partial funding by https://wamri.ai/.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
This study uses the MIMIC dataset. We are using the MIMIC IRB. This study was approved by the Institutional Review Boards of Beth Israel Deaconess Medical Center (Boston, MA) and the Massachusetts Institute of Technology (Cambridge, MA). Requirement for individual patient consent was waived because the study did not impact clinical care and all protected health information was de-identified. De-identification was performed in compliance with Health Insurance Portability and Accountability Act (HIPAA) standards in order to facilitate public access to MIMIC-II. Deletion of protected health information (PHI) from structured data sources (e.g., database fields that provide patient name or date of birth) was straightforward. Additionally, PHI were removed from the discharge summaries and diagnostic reports as well as the approximately 700,000 free-text nursing and respiratory notes in MIMIC-II using an automated algorithm previously shown to out perform clinicians in detecting PHI.
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
We used publically available data from MIMIC dataset.