Abstract
Background The increasing availability of electronic health records has made it possible to construct and implement models for predicting intensive care unit (ICU) mortality using machine learning. However, the algorithms used are not clearly described, and the performance of the model remains low owing to several missing values, which is unavoidable in big databases.
Methods We developed an algorithm for subgrouping patients based on missing event patterns using the Philips eICU Research Institute (eRI) database as an example. The eRI database contains data associated with 200,859 ICU admissions from many hospitals (>400) and is freely available. We then constructed a model for each subgroup using random forest classifiers and integrated the models. Finally, we compared the performance of the integrated model with the Acute Physiology and Chronic Health Evaluation (APACHE) scoring system, one of the best known predictors of patient mortality, and the imputation approach-based model.
Results Subgrouping and patient mortality prediction were separately performed on two groups: the sepsis group (the ICU admission diagnosis of which is sepsis) and the non-sepsis group (a complementary subset of the sepsis group). The subgrouping algorithm identified a unique, clinically interpretable missing event patterns and divided the sepsis and non-sepsis groups into five and seven subgroups, respectively. The integrated model, which comprises five models for the sepsis group or seven models for the non-sepsis group, greatly outperformed the APACHE IV or IVa, with an area under the receiver operating characteristic (AUROC) of 0.91 (95% confidence interval 0.89–0.92) compared with 0.79 (0.76–0.81) for the APACHE system in the sepsis group and an AUROC of 0.90 (0.89–0.91) compared with 0.86 (0.85–0.87) in the non-sepsis group. Moreover, our model outperformed the imputation approach-based model, which had an AUROC of 0.85 (0.83–0.87) and 0.87 (0.86–0.88) in the sepsis and non-sepsis groups, respectively.
Conclusions We developed a method to predict patient mortality based on missing event patterns. Our method more accurately predicts patient mortality than others. Our results indicate that subgrouping, based on missing event patterns, instead of imputation is essential and effective for machine learning against patient heterogeneity.
Trial registration Not applicable.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This work was supported by JSPS KAKENHI Grant Number JP 20K17834.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
Prior to requesting access to Philips eICU Research Institute (eRI) database, researchers are required to complete the CITI Data or Specimens Only Research course. We have completed the course and received the approval for the use of 31 csv files in the eRI database. Regarding the statement about the ethics oversight body that gave ethical approval for the collection of the original data, the original database is released under the Health Insurance Portability and Accountability Act (HIPAA) safe harbor provision. The re-identification risk was certified as meeting safe harbor standards by Privacert (Cambridge, MA) (HIPAA Certification no. 1031219-2).
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Footnotes
E-mail: Shoji Tatsuma: t-shoji{at}dna-chip.co.jp, Yonekura Hiroshi: hiroshi.yonekura{at}fujita-hu.ac.jp, Sato Yoshiharu: yo-sato{at}dna-chip.co.jp, Kawashiki yohei: ykawasaki{at}chiba-u.jp
Data Availability
The datasets generated and/or analyzed during the current study are available in the eICU repository.
List of abbreviations
- APACHE
- Acute Physiology and Chronic Health Evaluation
- ICU
- Intensive Care Unit
- eRI
- eICU Research Institute
- APS
- Acute Physiology Score
- ROC
- Receiver Operating Characteristic
- AUROC
- Area Under the ROC