Abstract
Machine learning (ML) holds great promise to support, improve, and automatize clinical decision-making in hospitals. Data protection regulations, however, hinder abundantly available routine data from being shared across sites for model training. Generative models can overcome this limitation by learning to synthesize hospital data from a target population while ensuring data privacy. Clinical time series acquired during intensive care are, however, difficult to model using established techniques, especially due to uneven sampling intervals. Here we introduce GHOSTS (Generator of Hospital Time Series), a novel generator of synthetic patient trajectories that is capable of generating heterogeneous hospital data including realistic time series with uneven sampling intervals. We further design a suite of novel benchmarks, GHOSTS-Bench. We train GHOSTS on a large cohort of patient data from the MIMIC-IV critical care dataset and measure the quality of the generated data in terms of how faithfully the distributions of individual features in the real data are approximated, how well spatio-temporal dynamics in the multivariate time series are preserved, and how well ML models trained on the generated data can solve a clinical prediction task on the real data. We observe that GHOSTS outperforms a state-of-the-art approach, DoppelGANger, with respect to these criteria.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This work has been performed within the "Metrology for Artificial Intelligence in Medicine (M4AIM)" programme funded by the German Federal Ministry for Economy and Climate Action (BMWK) in the frame of the QI-Digital Initiative, and received further support from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (Grant agreement No. 758985). Niklas Giesa is funded by the German Academic Scholarship Foundation.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The Institutional Review Board at the Beth Israel Deaconess Medical Center waived informed consent and approved the sharing of the MIMIC dataset
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data Availability
All data produced in the present study will be available upon reasonable request to the authors