Generalizing intensive care AI across time scales in resource-limited settings
==============================================================================

* Akshaya Devadiga
* Pradeep Singh
* Jhuma Sankar
* Rakesh Lodha
* Tavpritesh Sethi

## Abstract

Temporal resolution of physiological monitoring in intensive care varies widely across healthcare systems. Artificial intelligence models assume a uniform and fixed frequency of sampling, thus limiting the generalizability of models, especially to resource-limited settings. Here, we propose a novel resolution-transfer task for physiological time series and ask whether models trained on high-resolution data can generalize to a low data-density setting without the need to retrain them. SafeICU, a novel longitudinal pediatric intensive care dataset spanning ten years from a tertiary care hospital in India, was used to test this hypothesis. Self-supervised transformer models were trained on 144,271 patient-hours of high-resolution physiological signals from 984 pediatric ICU stays to learn representations of heart rate, respiratory rate, oxygen saturation, and arterial blood pressure. Transfer of this model to low-resolution data established robust performance in clinically relevant lower-frequency intervals, consistently outperforming models trained directly at coarser resolutions. Further, these representations generalized across patient populations, maintaining performance when evaluated on adult intensive care cohorts from the MIMIC-III and eICU databases without retraining. In a downstream task of early shock prediction, models achieved strong discrimination in the pediatric cohort (area under the receiver operating characteristic curve (AUROC) 0.87; area under the precision-recall curve (AUPRC) 0.92) and retained stable performance across monitoring intervals from 10 to 60 minutes (AUROC 0.78-0.88).

Together, these results demonstrate that physiological representations learned from high-resolution data enable time-scale-robust and transferable AI for intensive care. The publicly released SafeICU dataset, comprising longitudinal vital signs, laboratory measurements, treatment records, microbiology, and admission and discharge, provides a foundation for developing and deploying generalizable clinical AI in resource-limited settings.

## 1. Main

Subtle changes in physiological signals often precede clinical deterioration in critically ill children. Continuous monitoring of vital signs is therefore central to pediatric intensive care, enabling early recognition of conditions such as sepsis, shock, and respiratory failure [1–2]. Physiological time-series data have consequently become an important foundation for data-driven early warning and risk stratification models in intensive care. Despite rapid progress in such approaches, translation into routine clinical practice remains limited because model performance often deteriorates when deployed outside the environment in which training data were collected [3–6]. Differences in patient populations, data quality, and clinical workflows contribute to this lack of portability, limiting the reliability of decision-support tools across diverse care environments.

Heterogeneity in monitoring frequency is a significant and sometimes overlooked deployment hurdle. Intensive care units differ substantially in how frequently physiological variables are recorded, reflecting variation in bedside monitoring infrastructure, staffing capacity, and documentation practices [7–8]. In many hospitals, particularly in low- and middle-income settings, vital signs are charted intermittently at coarse temporal resolution rather than captured continuously. However, most machine learning pipelines for physiological time-series modeling assume consistent sampling frequency between model development and deployment. Models are typically trained and evaluated at fixed aggregation intervals determined by dataset availability rather than real-world deployment conditions. This mismatch between training resolution and deployment resolution can lead to unstable model behavior, reduced discrimination, and limited clinical trust, yet it has received relatively little systematic investigation in pediatric critical care.

Pediatric intensive care presents additional challenges for robust model development. Children have age-dependent normal ranges and heterogeneous disease trajectories and exhibit rapidly changing physiology, producing time-series data that are sparse, noisy, and highly variable [9]. These challenges are compounded by limited access to large, high-quality pediatric ICU datasets compared with adult critical care, where several widely publicized resources exist [8–11]. As a result, many prediction models for critical illness are developed using adult cohorts from high-income settings, raising concerns regarding fairness and validity when applied to pediatric populations or resource-limited environments [4,6]. Although the Pediatric Intensive Care (PIC) database from China represents an important step toward addressing pediatric data scarcity [8], publicly available pediatric datasets with high-temporal-resolution physiological monitoring remain limited, particularly in low- and middle-income countries. In addition, only a small number of studies have explored self-supervised representation learning approaches for physiological time-series in pediatric critical care, and few have systematically evaluated how temporal-resolution mismatch affects the stability and transferability of learned representations across monitoring frequencies.

To support generalisable pediatric critical care research, there is a growing need for openly accessible datasets that reflect diverse clinical populations and real-world monitoring practices [12–13]. In response to this need, we introduce SafeICU, an open-access pediatric intensive care unit database derived from routinely collected clinical data at the All India Institute of Medical Sciences (AIIMS), New Delhi. SafeICU contains longitudinal high-frequency physiological time series together with laboratory results, microbiology records, treatment charts, clinical notes, and outcomes, and has been de-identified using established privacy-preserving protocols [14]. By providing high-temporal-resolution monitoring data from a tertiary-care center in a low- and middle-income country, SafeICU enables systematic evaluation of temporal robustness in clinical time-series modelling.

Recent advances in self-supervised representation learning offer a potential pathway to improve the portability of clinical prediction models. In natural language processing, masked language modelling has enabled the learning of reusable representations that generalise across tasks and domains [15–17]. Similar approaches have begun to emerge for structured physiological time-series data. However, the robustness of learned physiological representations under real-world constraints, particularly temporal resolution mismatch and cross-cohort transfer, remains poorly understood, especially in pediatric settings [18].

In this study, we investigate whether physiological representations learned from high-temporal-resolution pediatric ICU data remain stable when applied across clinically relevant monitoring frequencies and patient populations without retraining. Using the SafeICU database together with publicly available adult ICU datasets, we train self-supervised representation models on fine-grained vital-sign sequences and systematically evaluate their robustness to temporal resolution mismatches. We further examine downstream clinical relevance using early shock prediction as an illustrative application. By releasing the SafeICU database and pretrained representation models, this work aims to support scalable, resolution-agnostic deployment of data-driven intensive care decision-support tools across diverse clinical environments.

## 2. Methodology

### 2.1 Study design and data source

This study was conducted as a retrospective observational cohort study using data from SafeICU, a pediatric intensive care unit (PICU) database curated from routinely collected clinical data at the AIIMS, New Delhi, India. SafeICU comprises patient-level data from a single-center 8-bedded PICU. Despite its limited bed capacity, the database spans nearly a decade of admissions and captures high-frequency physiological monitoring data generated during routine clinical care. The analyses reported here used data collected between January 2015 and December 2024; data collection for the SafeICU database is ongoing, with future updates planned.

The SafeICU database comprises longitudinal, patient-level data collected as part of standard clinical care, integrating information on demographics, diagnoses, treatments, laboratory and microbiology results, outcomes, and continuously recorded physiological signals. Continuous vital signs, including heart rate (HR), respiratory rate (RR), oxygen saturation (SpO₂), arterial blood pressure (ABP), and temperature, were recorded at 15-second temporal resolution using bedside monitoring systems. Each record in SafeICU corresponds to a single PICU admission.

The study protocol was reviewed and approved by the Ethics Committee of AIIMS, New Delhi (approval numbers IEC/NP-211/08·05·2015 and IEC-787/07.10.2022, RP-14/2022). The requirement for individual informed consent was waived because the study involved retrospective analysis of routinely collected clinical data, and all data were anonymized before use. This observational study was conducted and reported in accordance with the STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) guidelines [19]. As the data used were routinely collected during standard clinical care, reporting was also guided by the RECORD (REporting of studies Conducted using Observational Routinely-collected health Data) statement [20].

To assess generalizability beyond the pediatric cohort and institutional setting, adult ICU data were obtained from the Medical Information Mart for Intensive Care III (MIMIC-III) database [10] and the eICU Collaborative Research Database [11], both publicly available through PhysioNet. These adult ICU datasets were used for external evaluation and cross-cohort generalization analyses.

### 2.2 Study population

The source population consisted of all pediatric patients aged 0 to 18 years admitted to the PICU during the study period. The analytic cohort used for vitals-based representation learning included PICU admissions with available continuous physiological monitoring data for HR, ABP, RR, and SpO₂.

Admissions were excluded if any of the required vital signs were entirely missing or if the duration of available physiological data was insufficient for analysis, defined as fewer than six recorded data points for any vital sign. These criteria were applied to ensure that included admissions had adequate longitudinal data to support temporal aggregation across monitoring intervals. Although the SafeICU database includes additional clinical data streams, including laboratory, microbiology, and treatment, inclusion in the cohort for this study was based solely on the availability of continuous vital sign data. For patients with multiple PICU admissions, each stay was included independently. In addition, interruptions in physiological monitoring exceeding 24 hours were considered as a new ICU stay and were treated as separate admissions.

For downstream evaluation, the pediatric cohort definition and outcome labeling were held consistent with a previously published study derived from the SafeICU database. Adult ICU cohorts used for external generalization analyses were derived from the MIMIC-III and eICU Collaborative Research Database and were restricted to patients with available continuous recordings of the same physiological variables.

### 2.3 Data elements and variables

#### Data integration across clinical streams

SafeICU integrates routinely collected structured clinical data and high-frequency physiological time-series data into a unified relational database. The database schema was designed to be broadly compatible with the MIMIC-IV framework to facilitate usability and comparison with established critical care datasets. Data were organized into core tables, including patient demographics (PATIENT), admissions (ADMISSION), treatment chart documentation (NOTE EVENTS), continuously recorded physiological signals (VITAL SIGNS), laboratory results (LAB EVENTS), and microbiology results (MICROBIOLOGY EVENTS). Records across tables were linked using a unique patient identifier (SUBJECT_ID) and an admission-level identifier.

Following extraction from source hospital systems, records were harmonized and filtered to ensure internal consistency across clinical streams. Basic data quality checks were performed, including validation of age ranges, removal of physiologically unacceptable values, and handling of extreme outliers. Because time ranges differed across source systems, integration was performed using SUBJECT_ID-based linkage to retain clinically coherent patient episodes. An overview of the SafeICU database structure and integration workflow is provided in Figure 1, database characteristics are summarised in Supplementary Material 4, and a comparative summary of SafeICU against widely used publicly available critical care databases is provided in Supplementary Material 1.

![Figure 1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2026/04/24/2026.04.23.26351588/F1.medium.gif)

[Figure 1.](http://medrxiv.org/content/early/2026/04/24/2026.04.23.26351588/F1)

Figure 1. SafeICU database structure and integration workflow.
Routinely collected clinical data streams were harmonised using unique patient identifiers to construct a unified de-identified research database.

#### Data anonymization

Before incorporation into the SafeICU database, all data were de-identified in accordance with Health Insurance Portability and Accountability Act (HIPAA) guidelines [21]. Structured de-identification was performed through the removal of direct identifiers, including patient names, clinician names, and other free-text identifiers. Dates were consistently shifted forward for each patient using a randomly generated offset, preserving within-patient temporal intervals while preventing identification through calendar dates.

Unique identifiers and institutional codes were pseudonymised using a secure hashing approach (PBKDF2 with HMAC-SHA256), ensuring that original identifiers could not be recovered from the released dataset. De-identification procedures were applied uniformly across all clinical streams before database integration. Because the cohort is restricted to patients younger than 18 years, additional age hiding procedures used in adult datasets (eg, masking ages >89 years) were not required.

### 2.4 Temporal resolution and aggregation strategy

Physiological signals in SafeICU were originally recorded at 15-second temporal resolution. Missingness in the continuous physiological time-series was assessed across the four primary vital signs (HR, RR, SpO₂, and ABP). Admissions with extensive missingness were excluded; specifically, admissions were removed if overall missingness across the four vital sign streams exceeded 50%. For the remaining admissions with less than 50% missingness, missing values were imputed at the native resolution using a Kalman smoothing approach implemented via the na_kalman function from the imputeTS package. Imputation was performed independently for each physiological signal to preserve within-signal temporal structure.

To evaluate robustness under heterogeneous monitoring frequencies, we first segmented the continuous 15-second physiological time series into overlapping sliding windows, with a 30-minute overlap between consecutive segments to preserve temporal continuity and contextual information. Following this sharding step, coarser time-series representations were generated by aggregating the native 15-second signals into clinically relevant temporal intervals of 5, 10, 15, 30, and 60 minutes. For each vital sign, values within each aggregation window were summarised using the median to reduce sensitivity to outliers and transient artefacts. Windows with insufficient recorded values were treated as missing.

To simulate real-world deployment scenarios where monitoring is intermittently recorded at a lower frequency, models trained at one temporal resolution were evaluated on data aggregated at different resolutions without retraining. This enabled systematic assessment of temporal-resolution mismatch. Temporal aggregation was performed independently for each admission after segmentation of continuous monitoring into distinct PICU stays.

### 2.5 Symbolic encoding and self-supervised representation learning

To enable self-supervised learning on physiological time-series data, continuous vital sign measurements were transformed into discrete symbolic sequences. For each physiological variable (HR, RR, SpO₂, and ABP), values were discretised into a fixed number of bins to generate integer-valued tokens. Discretisation was performed using a clustering-based approach, where cluster centroids defined data-driven physiological state boundaries.

Self-supervised representation learning was performed using a masked modelling framework, where a subset of tokens within each physiological sequence was randomly masked, and the model was trained to predict the masked tokens using surrounding context. During training, 15% of tokens in each sequence were randomly selected for masking. Sequences were constructed from temporally ordered symbolic tokens derived from the aggregated vital sign time series. Masking was applied at random positions across the sequence, enabling the model to learn contextual dependencies and temporal patterns.

A transformer-based encoder architecture was used to learn contextualised physiological representations. The model was trained using a cross-entropy loss between predicted and true masked tokens. After pretraining, the learned embeddings were extracted as fixed-dimensional representations for downstream evaluation, including temporal resolution transfer experiments and early shock prediction. Hyperparameters for pretraining were optimised using Optuna through Bayesian optimisation to minimise validation loss (Supplementary Material 2). The final model configuration was selected based on the best-performing trial on the validation set.

### 2.6 Downstream evaluation: early shock prediction

To evaluate whether the learned physiological representations retained clinically meaningful discriminative information under temporal resolution transfer, we performed a downstream early shock prediction task. Shock prediction was selected as an illustrative application because it is clinically relevant in intensive care and has been widely used in prior machine learning studies for evaluating the clinical utility of learned physiological representations. In this study, shock prediction was used to assess the stability of pretrained embeddings across temporal resolutions and patient cohorts rather than to characterise the full clinical determinants of shock.

Shock onset was defined using the Shock Index (SI), calculated as the ratio of heart rate to systolic blood pressure, which has been validated for bedside risk stratification and prognostication [22,23]. In adult populations, SI values greater than 0.7 are commonly used to indicate shock. Given that SafeICU is a pediatric data resource, age-adjusted SI thresholds were applied to account for physiological variation across developmental stages. Shock was defined as SI >2.3 (≤3 months), >1.7 (4-6 months), >1.5 (7-12 months), >1.2 (13-36 months), >1.15 (37-72 months), >0.95 (73-144 months), and >0.77 (>144 months). The prediction horizon was set to 8 hours before shock onset [24]. The pediatric cohort definition, including inclusion and exclusion criteria, data splits, and outcome labeling procedure, is described in Supplementary Material 3 and was held constant across all analyses to enable direct comparison while isolating the effect of temporal resolution.

Pretrained physiological embeddings were extracted independently for each vital sign using the corresponding pretrained representation model. For temporal resolution transfer experiments, embeddings were generated using a model pretrained at 5-minute resolution and applied to data aggregated at coarser resolutions (10-60 minutes) without retraining. Embeddings from all physiological signals were concatenated to form a unified feature vector. Dimensionality reduction was performed using principal component analysis, and age and sex were included as additional covariates. No fine-tuning of the pretrained representation models was performed during downstream classifier training.

Several standard classifiers were evaluated, including logistic regression, support vector machines, random forests, and gradient boosting. Model selection was performed using five-fold cross-validation on the training set. Based on cross-validated performance, gradient boosting was selected as the final classifier. The selected model was retrained on the combined training and validation sets and evaluated on a held-out test set. The same feature construction and evaluation pipeline was applied to both pediatric and adult ICU cohorts to enable direct comparison of downstream performance under temporal resolution mismatch and across patient populations.

### 2.7 Evaluation metrics and statistical analysis

Model performance was summarised descriptively across physiological signals, temporal resolutions, and patient cohorts. For self-supervised representation learning, masked modelling performance was evaluated using MLM accuracy and validation loss for each vital sign and aggregation interval. For downstream early shock prediction, discrimination was assessed using the area under the receiver operating characteristic curve (AUROC) and area under the precision-recall curve (AUPRC), with performance compared across temporal resolutions to quantify the effect of temporal-resolution mismatch at inference. All analyses were performed using Python-based scientific computing libraries, including PyTorch for model training and scikit-learn for downstream classification and evaluation.

## 3 Results

### 3.1 SafeICU cohort characteristics

The pretraining cohort comprised 621 unique pediatric patients from the SafeICU database, corresponding to 984 PICU stays after temporal segmentation of physiological monitoring streams, where interruptions in recording exceeding 24 hours were treated as separate stays. This analytic cohort represents the subset of SafeICU admissions with sufficient continuous physiological monitoring data across the four primary vital signs (HR, RR, SpO₂, and ABP) after missingness-based filtering and imputation.

Data were partitioned at the patient level into training (434 patients), validation (63 patients), and test (124 patients) sets for language model training, hyperparameter optimisation, and masked modelling evaluation, respectively.

### 3.2 Pretraining performance across temporal resolutions

The performance of representation learning differed markedly between baseline and hyperparameter-optimized configurations. As shown in Figure 2, models trained with baseline hyperparameters exhibited higher training and validation loss, whereas optimized models converged rapidly and achieved substantially lower loss values, indicating improved optimization stability and more effective representation learning.

![Figure 2.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2026/04/24/2026.04.23.26351588/F2.medium.gif)

[Figure 2.](http://medrxiv.org/content/early/2026/04/24/2026.04.23.26351588/F2)

Figure 2. Training and validation loss during masked modelling pretraining.
Loss trajectories for MLM pretraining on heart rate sequences at 5-minute resolution. (A) Baseline hyperparameters. (B) Best-performing Optuna trial (Trial 64). (C) Final model trained using optimized hyperparameters, demonstrating faster convergence and lower validation loss.

Consistent with these training dynamics, hyperparameter optimization resulted in significant improvements in MLM accuracy across all physiological signals in the SafeICU pediatric cohort (Table 1). At 5-minute temporal resolution, median MLM accuracy increased from 0.42 to 0.79 for HR, from 0.45 to 0.74 for RR, from 0.55 to 0.84 for SpO₂, and from 0.50 to 0.92 for ABP.

View this table:
[Table 1.](http://medrxiv.org/content/early/2026/04/24/2026.04.23.26351588/T1)

Table 1. MLM accuracy across temporal resolutions in the SafeICU pediatric cohort.
Median (IQR) MLM accuracy for HR, RR, SpO₂, and ABP at aggregation intervals from 5 to 60 minutes. Baseline and Optuna-optimized results are shown for 5-minute training; all other resolutions reflect optimized models.

Across optimized models, MLM accuracy exhibited a systematic dependence on temporal resolution. Performance was highest at finer resolutions and declined progressively as temporal aggregation increased for all physiological signals. For example, median HR accuracy decreased from 0.79 at 5-minute resolution to 0.53 at 60-minute resolution, while SpO₂ accuracy declined from 0.84 to 0.40 over the same range. Similar resolution-dependent trends were observed for RR and ABP. Because models were trained and evaluated at matched temporal resolutions, these results isolate the effect of temporal aggregation on representation learning performance.

Corresponding analyses in the adult ICU cohort derived from the combined MIMIC-III and eICU datasets are reported in Supplementary Material 5. In the adult cohort, optimized models similarly outperformed baseline configurations across all signals, with median MLM accuracy reaching 0.87 for HR, 0.74 for RR, 0.79 for SpO₂, and 0.79 for ABP at 5-minute resolution. As in the pediatric cohort, performance declined at coarser resolutions, with HR accuracy decreasing to 0.57 and ABP accuracy to 0.20 at 60-minute resolution.

### 3.3 Resolution Transfer

Representations pretrained at fine temporal resolution (5-minute aggregation), termed PhysioStack, demonstrated robust transfer when evaluated at coarser temporal resolutions without retraining. Within the pediatric cohort, PhysioStack achieved consistently comparable or higher masked modelling performance than resolution-matched models across evaluation intervals from 10 to 60 minutes (**Figure 3**), with the largest performance improvements observed at the coarsest resolutions. For example, for heart rate, PhysioStack maintained higher median MLM accuracy at 30-minute and 60-minute evaluations compared with models trained directly at those resolutions (0.63 vs 0.61 at 30 minutes; 0.58 vs 0.53 at 60 minutes), indicating improved robustness under temporal aggregation.

![Figure 3.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2026/04/24/2026.04.23.26351588/F3.medium.gif)

[Figure 3.](http://medrxiv.org/content/early/2026/04/24/2026.04.23.26351588/F3)

Figure 3. High-resolution pretraining improves robustness under temporal aggregation and cross-cohort transfer.
MLM accuracy of PhysioStack representations pretrained at 5-minute resolution and evaluated on coarser aggregation intervals (10-60 minutes) without retraining. Performance is shown for the SafeICU pediatric cohort and adult ICU cohorts (eICU) and compared with resolution-matched baselines trained separately at each temporal resolution.

Cross-cohort evaluation further showed that pediatric-trained PhysioStack representations generalised effectively to adult ICU data. When evaluated on adult cohorts, PhysioStack achieved a median MLM accuracy of 0.87 at 5 minutes, decreasing gradually to 0.67 at 60 minutes, closely matching the performance of adult-trained high-temporal-resolution representations across all evaluation resolutions (Supplementary Material 6). These findings suggest that high-resolution pretraining captures stable physiological structure that remains informative under temporal downsampling and transfers across patient populations.

### 3.4 Downstream evaluation: Early Warning System for Predicting Shock

Gradient boosting classifiers trained on 5-minute PhysioStack embeddings achieved robust early shock prediction performance in the SafeICU pediatric cohort (Table 2). Under five-fold cross-validation, the model demonstrated strong discrimination, with a median AUROC of 0.87 (IQR 0.86-0.88) and AUPRC of 0.92 (0.91-0.93). When evaluated under temporal resolution mismatch without retraining, predictive performance remained stable across coarser aggregation intervals from 10 to 60 minutes, with AUROC ranging from 0.78 to 0.88 and AUPRC from 0.88 to 0.94, indicating that high-resolution embeddings preserved clinically relevant signal despite substantial downsampling at inference.

View this table:
[Table 2.](http://medrxiv.org/content/early/2026/04/24/2026.04.23.26351588/T2)

Table 2. Early shock prediction performance using 5-minute PhysioStack embeddings in the SafeICU pediatric cohort.
Performance of gradient boosting classifiers evaluated under temporal resolution mismatch by applying a model trained on 5-minute embeddings to test data aggregated at 5-60 minute intervals without retraining. Values are reported as mean ± SD.

Consistent temporal robustness was observed in adult ICU cohorts (Supplementary Material 7). Adult-trained 5-minute embeddings achieved a median AUROC of 0.87 (0.86-0.88) and AUPRC of 0.92 (0.91-0.93) under five-fold cross-validation. When applied to coarser temporal resolutions without retraining, AUROC decreased gradually from 0.89 at 5 minutes to 0.78 at 60 minutes, while AUPRC decreased from 0.94 to 0.87. Full classification metrics across resolutions are provided in Table 2 and Supplementary Material 7.

## Discussion

This study demonstrates that physiological representations learned from high-temporal-resolution pediatric intensive care data remain robust when applied across heterogeneous monitoring frequencies and patient cohorts without retraining. Using the SafeICU pediatric ICU database and external adult ICU datasets, we show that representations pretrained at fine temporal resolution preserve informative structure when evaluated on progressively coarser aggregation intervals. This finding addresses a practical barrier to real-world deployment of data-driven intensive care decision-support tools, as monitoring frequency varies widely across hospitals due to infrastructure limitations, staffing constraints, and workflow-driven charting practices.

Most existing approaches to intensive care modelling rely on fixed temporal resolutions defined during dataset construction, implicitly assuming that sampling frequency during deployment will match training conditions. In practice, this assumption is frequently violated, particularly in low- and middle-income settings where continuous bedside monitoring is not consistently available. Our results extend prior work by demonstrating that high-resolution pretraining captures physiological patterns that remain informative under temporal downsampling, resulting in gradual rather than abrupt degradation as monitoring frequency becomes coarser. This resolution robustness addresses a limitation of fixed-resolution modelling approaches that has received limited empirical evaluation in pediatric critical care.

A persistent challenge in clinical machine learning is limited external validity, with many models demonstrating strong internal performance but reduced reliability when applied to new patient populations or clinical settings [9][25]. This issue is particularly relevant in pediatric critical care, where publicly available datasets are scarce, and model development often relies on adult cohorts [26]. Notably, we observed that representations learned from pediatric ICU data generalised effectively to adult ICU cohorts across temporal resolutions without cohort-specific fine-tuning. This suggests that the learned embeddings capture physiologically meaningful patterns that extend beyond population-specific characteristics, and highlights the potential of representation-level robustness as a pathway toward more portable clinical models [9][25][27].

To assess whether resolution-robust representations retained clinically meaningful information, we evaluated their utility for early shock prediction as an illustrative downstream task. Predictive performance remained stable when embeddings trained at 5-minute resolution were applied to data aggregated at 10-60-minute intervals without retraining. Although this analysis was not intended to establish a definitive shock prediction model, it supports the interpretation that resolution robustness is not limited to substitute pretraining metrics and can translate into stable discrimination for clinically relevant tasks.

This study has several limitations. First, the pediatric cohort was derived from a single tertiary-care PICU, which may restrict the representativeness of case mix, monitoring practices, and missingness patterns. Second, although external evaluation was performed using adult ICU cohorts, additional pediatric cohorts from other regions would strengthen conclusions regarding portability. Third, the evaluation focused on four continuously monitored vital signs and does not address robustness for other modalities such as laboratory trajectories, medications, or clinical notes. Finally, while resolution robustness supports portability, prospective evaluation is required to determine whether such representations improve real-world clinical workflow integration and patient outcomes.

Despite these limitations, the findings have practical implications for the scalable deployment of intensive care decision-support systems. Resolution-robust physiological representations could reduce the need for site-specific retraining and enable the reuse of pretrained models across hospitals with heterogeneous monitoring infrastructure. This is particularly relevant in pediatric critical care, where data scarcity and variable monitoring practices have historically limited the development and external validation of generalisable models. By releasing the SafeICU dataset and pretrained models, this work provides a resource for benchmarking temporal robustness and for supporting the development of portable intensive care AI in diverse settings.

## Conclusion and future directions

The findings of this study have direct relevance for the deployment of data-driven decision-support tools in intensive care. In routine clinical environments, particularly in low- and middle-income countries, continuous high-frequency monitoring is often unavailable, and physiological data are frequently documented at irregular or coarser temporal resolutions. The observation that physiological representations pretrained on high-resolution data remain stable when transferred to lower monitoring frequencies suggests that models developed in well-resourced settings can potentially be reused across hospitals with heterogeneous monitoring practices without requiring resolution-specific retraining.

From a clinical systems perspective, resolution-robust representations may reduce implementation barriers for early warning and risk stratification tools by enabling a shared pretrained embedding backbone to support multiple downstream applications. This approach could improve consistency of model behaviour across different settings and reduce the need for repeated local redevelopment when monitoring infrastructure differs. Although shock prediction was used here only as an illustrative downstream task, the preservation of discrimination across temporal resolutions supports the broader hypothesis that representation-level robustness can improve reliability under real-world data constraints. Future work should focus on prospective evaluation in additional pediatric ICUs with differing monitoring infrastructure and case mix to establish clinical usability and external validity. Extending this framework to multimodal ICU data, including laboratory measurements, medications, and clinical notes, will be important to determine whether resolution-robust pretraining can support more comprehensive decision-support systems. Finally, integration of such representations into clinician-facing workflows will require careful evaluation of interpretability, calibration, and safety to ensure alignment with bedside decision-making.

## Data Availability

The SafeICU pediatric intensive care unit database released as part of this study is available at https://safeicu.aiims.edu.in . Access to the dataset is available to credentialed users upon completion of a data use agreement, subject to institutional and ethical requirements. Pretrained PhysioStack representation models, together with code for data preprocessing, model training, and evaluation, are publicly available at https://github.com/tavlab-iiitd/PhysioStack .

[https://github.com/tavlab-iiitd/PhysioStack](https://github.com/tavlab-iiitd/PhysioStack) 

[https://safeicu.aiims.edu.in](https://safeicu.aiims.edu.in) 

## Declarations

### i) Funding

This research work was supported by the SafeICU Initiative, funded by the Indian Council of Medical Research (ICMR) under grant award number A1-Ad-hoc/18/2022-A1-Cell; the Centre of Excellence in Healthcare (COEH) at IIIT-Delhi; and a Wellcome Trust/DBT India Alliance Fellowship (IA/CPHE/14/1/501504) awarded to Prof Tavpritesh Sethi.

### ii) Conflicts of interest

The authors declare that they have no competing interests

### iii) Ethics approval

*   The study received ethical approval from the Ethics Committee of AIIMS Delhi under the reference numbers IEC/NP-211/08·05·2015 and IEC-787/07.10.2022, RP-14/2022.

*   This observational study was conducted and reported in accordance with the STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) guidelines.

*   As the data used were routinely collected during standard clinical care, reporting was also guided by the RECORD (REporting of studies Conducted using Observational Routinely-collected health Data) statement.

### iv) Consent to participate

*   Ethics approval was obtained for this study, and the requirement for individual consent was waived because all data were anonymized before use.

### v) Consent for publication

Not applicable

### vi) Code availability

The code underlying this study is available at [https://github.com/tavlab-iiitd/PhysioStack](https://github.com/tavlab-iiitd/PhysioStack)

### vii) Availability of data and materials

The SafeICU pediatric intensive care unit database is released as part of this study and is available at safeicu.aiims.edu.in. Access to the dataset is granted to credentialed users following completion of a data use agreement (DUA), in accordance with institutional and ethical requirements.

Pretrained PhysioStack representation models, along with code for data preprocessing, model training, and evaluation, are publicly available at [https://github.com/tavlab-iiitd/PhysioStack](https://github.com/tavlab-iiitd/PhysioStack). Detailed documentation and usage instructions are provided in the accompanying README files to facilitate reproducibility and reuse.

### viii) Authors’ contributions

AD, PS, JS, RL, and TS conceived the study and defined the overall research objectives. PS, AD, TS, and RL managed data curation. AD and PS conducted the formal analysis and visualized the results. AD and PS led the data preprocessing, representation learning experiments, temporal resolution analyses, and downstream evaluations, and performed all statistical analyses. TS, JS, and RL supervised the study, guided the methodological design and interpretation of results, and contributed to manuscript writing and critical revision for intellectual content. AD and PS supported the SafeICU data release by developing and maintaining the data publication backend, managing server infrastructure, handling access control and credentialing workflows, and assisting with deployment, storage, and IP management required for controlled data sharing. The first draft of the report was written by AD and PS, with critical revisions made for important intellectual content by TS, RL, AD, and PS. All authors read and approved the final version of the manuscript and agree to be accountable for all aspects of the work.

## Supplementary Material

View this table:
[Supplementary Material 1.](http://medrxiv.org/content/early/2026/04/24/2026.04.23.26351588/T3)

Supplementary Material 1. Summarizes key characteristics of the SafeICU pediatric intensive care unit database in comparison with widely used critical care databases, including MIMIC-IV, the eICU Collaborative Research Database, and the Pediatric Intensive Care (PIC) database.

View this table:
[Supplementary Material 2.](http://medrxiv.org/content/early/2026/04/24/2026.04.23.26351588/T4)

Supplementary Material 2. Hyperparameter Optimization for Representation Learning.
To identify optimal training configurations for representation learning, automated hyperparameter optimization was performed using Optuna, a Bayesian optimization framework. The optimization objective was to maximize MLM accuracy on a held-out validation set.

Each optimization trial involved training a transformer-based model with a unique combination of hyperparameters sampled from predefined search spaces. The hyperparameters explored included optimizer type, learning rate, batch size, weight decay, and learning rate scheduler. Optimization trials were parallelized across available graphics processing units to accelerate convergence. To limit unnecessary computation, early stopping was applied, and training was terminated if validation performance did not improve for 25 consecutive epochs.

Hyperparameter optimization was performed independently for each physiological signal and temporal resolution. The resulting optimized configurations were used for all representation learning experiments reported in the main manuscript unless otherwise specified.

**Supplementary Material 3.** Cohort Construction

Data from the Electronic ICU (eICU) database and the SafeICU pediatric database were used to construct the study cohorts. The eICU cohort included patients admitted to multiple ICU types, including medical (MICU), surgical (SICU), medical-surgical ICU, coronary care (CCU), cardiac ICU, cardiac surgery ICU (CSICU), cardiothoracic ICU (CTICU), and neuro-ICU units.

For both datasets, inclusion criteria required continuous vital sign monitoring for at least 7.5 hours during the ICU stay. Exclusion criteria were applied consistently and included: (i) absence of invasive arterial blood pressure measurements during the observation window; (ii) missing demographic information (age or sex), or age greater than 89 years where applicable; (iii) evidence of shock within the initial observation period to avoid bias from patients already exhibiting the outcome of interest; and (iv) missing data exceeding 10% of the observation window.

Since SafeICU is a pediatric data resource, different SI thresholds were used for various age groups (SI >2.3 (ages <= 3 months), >1.7 (4-6 months), >1.5 (7-12 months), >1.2 (13-36 months), >1.15 (37-72 months), >0.95 (73-144 months), >0.77 (>144 months))(22).

View this table:
[Supplementary Material 4.](http://medrxiv.org/content/early/2026/04/24/2026.04.23.26351588/T5)

Supplementary Material 4. Characteristics of PICU Admissions Included in the SafeICU Database.
Summarizes demographic characteristics, length of stay, and outcomes for pediatric intensive care unit admissions included in the SafeICU database between January 2015 and December 2024.

View this table:
[Supplementary Material 5.](http://medrxiv.org/content/early/2026/04/24/2026.04.23.26351588/T6)

Supplementary Material 5. Representation Learning Performance Across Temporal Resolutions in the Adult ICU Cohort.
Summarizes MLM performance across physiological signals and temporal resolutions in the combined adult ICU cohort derived from the MIMIC-III and eICU databases. Median performance (interquartile range) is reported for HR, RR, SpO₂, and ABP. Results at 5-minute resolution are shown for both baseline and hyperparameter-optimized models; results at all other resolutions reflect optimized models.

**Supplementary Material 6.** Resolution Transfer and Generalization of Adult-Trained Representations

This supplement presents the temporal resolution transfer and generalization behavior of physiological representations learned using adult ICU data. When adult-trained high-temporal-resolution representations were evaluated on adult ICU data (within-cohort evaluation), median MLM accuracy remained stable across temporal resolutions: 0.87 (Q1-Q3, 0.78-0.95) at 5 minutes and 0.67 (0.50-0.81) at 60 minutes, with intermediate values of 0.83 (0.72-0.92) at 10 minutes, 0.79 (0.67-0.89) at 15 minutes, and 0.74 (0.57-0.85) at 30 minutes.

Cross-cohort evaluation of the same adult-trained model on pediatric ICU data showed a consistent but lower performance profile across all resolutions, with median accuracy ranging from 0.74 (0.61-0.86) at 5 minutes to 0.50 (0.33-0.62) at 60 minutes, and intermediate values of 0.68 (0.55-0.82) at 10 minutes, 0.61 (0.50-0.74) at 15 minutes, and 0.58 (0.44-0.71) at 30 minutes.

For reference, a resolution-matched adult baseline is included, in which models were trained and evaluated at the same temporal resolution. Performance under this baseline is closely aligned with within-cohort evaluation using the 5-minute adult model across resolutions, confirming that representations learned at fine temporal resolution retain utility when applied to coarser adult ICU data.

![Supplementary Material 7.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2026/04/24/2026.04.23.26351588/F4.medium.gif)

[Supplementary Material 7.](http://medrxiv.org/content/early/2026/04/24/2026.04.23.26351588/F4)

Supplementary Material 7. 
Shock Prediction Model Selection and resolution transfer on adult data.

### Shock Classification Models

Four standard classification models were evaluated for early shock prediction using physiological embeddings as inputs:

*   Gradient Boosting Classifier (GBC)

*   Support Vector Machine (SVM)

*   Random Forest (RF)

*   Logistic Regression (LR)

All models were trained using embeddings extracted from pretrained language models and evaluated under consistent data splits, as described in the main Methods.

Class Imbalance Handling. For all models, class imbalance in the training set was addressed using random undersampling to achieve a 1:1 ratio of shock and non-shock samples. Undersampling was applied only to training data and not to validation or test sets.

### Five-Fold Cross-Validation Performance for Shock Prediction Models

Summarizes five-fold cross-validation performance of shock prediction models evaluated using physiological embeddings derived from pretrained language models. Median performance and interquartile range are reported across folds. Embeddings were generated from 5-minute resolution data using resolution-matched pretrained language models.

View this table:
[Table7](http://medrxiv.org/content/early/2026/04/24/2026.04.23.26351588/T7)

### Resolution Transfer in Shock Prediction Without Retraining (Adult ICU Data)

Shock prediction performance was evaluated under resolution mismatch by applying a shock model trained on 5-minute embeddings to test data aggregated at coarser temporal resolutions (10-60 minutes). Embeddings for all test sets were generated using the same pretrained 5-minute language model trained on the combined adult ICU cohort.

View this table:
[Table8](http://medrxiv.org/content/early/2026/04/24/2026.04.23.26351588/T8)

## ix) Acknowledgments

The authors acknowledge funding support from the Indian Council of Medical Research (ICMR; AI-Adhoc-18/2022-AI Cell), the Wellcome Trust/DBT India Alliance Fellowship (IA/CPHE/14/1/501504) awarded to Dr. Tavpritesh Sethi, and the Centre of Excellence in Healthcare (COEH), IIIT-Delhi. The authors also thank the Department of Computational Biology, IIIT-Delhi, for infrastructural support. We sincerely thank the technical assistance of the D5-ICU/MCB-PICU team, Anil Sharma, and Varun Prakash at the Pediatric Intensive Care Unit of the AIIMS, New Delhi. We also acknowledge the AIIMS computer facility for providing the computational infrastructure and secure access required for data storage, processing, and analysis. We specially thank Ranjeet Singh (AIIMS, New Delhi IT) and Bhawani Shah (IIIT-Delhi IT) for their support in establishing the network infrastructure required for SafeICU database development. Finally, we acknowledge Deepankar Verma, Likhith Devadiga, Pawan Yadav, and Shivam Mishra for their assistance with database vulnerability testing and security clearance.

## List of Abbreviations

ABP
:   Arterial Blood Pressure
AIIMS
:   All India Institute of Medical Sciences
HIPAA
:   Health Insurance Portability and Accountability
Act HMAC
:   Hash-based Message Authentication Code
HR
:   Heart Rate
ICUs
:   Intensive Care Units
MIMIC
:   Medical Information Mart for Intensive Care
MLM
:   Masked Language Model
PBICDF2
:   Password-Based Key Derivation Function 2
PICU
:   Pediatric Intensive Care Unit
RECORD
:   REporting of studies Conducted using Observational Routinely-collected health Data
RR
:   Respiratory Rate
SHA256
:   Secure Hash Algorithm256 SI - Shock Index
SpO₂
:   Oxygen Saturation
STROBE
:   Strengthening The Reporting of Observational studies in Epidemiology

*   Received April 23, 2026.
*   Revision received April 23, 2026.
*   Accepted April 24, 2026.


*   © 2026, Posted by openRxiv

The copyright holder for this pre-print is the author. All rights reserved. The material may not be redistributed, re-used or adapted without the author's permission.

## References

1.  [1].Alsabri, M., Mourid, M.R., Saleh, A.R. et al. Vital signs as biomarkers of early clinical deterioration in pediatric emergency departments: physiology, interpretation, and innovations: a narrative review. Int J Emerg Med 19, 6 (2026). doi:10.1186/s12245-025-01107-8.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/s12245-025-01107-8&link_type=DOI) 

2.  [2].Mayampurath, Anoop PhD1,2; Jani, Priti MD, MPH1; Dai, Yangyang MA2; Gibbons, Robert PhD3; Edelson, Dana MD, MS3; Churpek, Matthew M. MD, MPH, PhD3. A Vital Sign-Based Model to Predict Clinical Deterioration in Hospitalized Children*. Pediatric Critical Care Medicine 21(9):p 820–826, September 2020. | DOI: 10.1097/PCC.0000000000002414.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1097/PCC.0000000000002414&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32511200&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2026%2F04%2F24%2F2026.04.23.26351588.atom) 

3.  [3].Moor M., Banerjee O., Abad Z.S.H. et al. Foundation models for generalist medical artificial intelligence. Nature 616, 259–265 (2023). doi:10.1038/s41586-023-05881-4.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41586-023-05881-4&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=37045921&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2026%2F04%2F24%2F2026.04.23.26351588.atom) 

4.  [4].Röösli E., Bozkurt S., Hernandez-Boussard T. Peeking into a black box, the fairness and generalizability of a MIMIC-III benchmarking model. Sci Data 9, 24 (2022). doi:10.1038/s41597-021-01110-7.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41597-021-01110-7&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=35075160&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2026%2F04%2F24%2F2026.04.23.26351588.atom) 

5.  [5].Hong N., Liu C., Gao J., et al. State of the Art of Machine Learning–Enabled Clinical Decision Support in Intensive Care Units: Literature Review. JMIR Med Inform 2022;10(3):e28781, DOI: 10.2196/28781.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.2196/28781&link_type=DOI) 

6.  [6].Zhang K., Fan Y., Long K., Lan Y., Gao P. Research Hotspots and Trends of Deep Learning in Critical Care Medicine: A Bibliometric and Visualized Study. J Multidiscip Healthc. 2023;16:2155–2166. doi:10.2147/JMDH.S420709.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.2147/JMDH.S420709&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=37539364&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2026%2F04%2F24%2F2026.04.23.26351588.atom) 

7.  [7].de Andrade JBC, de Medeiros Cavalcante MAO, Lopes TLM, Treigher JMS, Balsells MD, Vasconcelos JL, Monteiro LC, da Silveira Mota DD. Discovery of data quality issues in electronic health records: profound consequences for critical care medicine applications - a systematized review. Crit Care. 2026 Jan 8;30(1):19. doi: 10.1186/s13054-025-05677-0. PMID: 41508097; PMCID: PMC12784561.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/s13054-025-05677-0&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=41508097&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2026%2F04%2F24%2F2026.04.23.26351588.atom) 

8.  [8].Zeng X., Yu G., Lu Y. et al. PIC, a paediatric-specific intensive care database. Sci Data 7, 14 (2020). doi:10.1038/s41597-020-0355-4.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41597-020-0355-4&link_type=DOI) 

9.  [9].Ganatra HA. Machine Learning in Pediatric Healthcare: Current Trends, Challenges, and Future Directions. J Clin Med. 2025 Jan 26;14(3):807. doi: 10.3390/jcm14030807. PMID: 39941476; PMCID: PMC11818243.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.3390/jcm14030807&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=39941476&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2026%2F04%2F24%2F2026.04.23.26351588.atom) 

10. [10].Johnson A., Pollard T., Shen L. et al. MIMIC-III, a freely accessible critical care database. Sci Data 3, 160035 (2016). doi:10.1038/sdata.2016.35.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/sdata.2016.35&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27219127&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2026%2F04%2F24%2F2026.04.23.26351588.atom) 

11. [11].Pollard T., Johnson A., Raffa J. et al. The eICU Collaborative Research Database, a freely available multi-center database for critical care research. Sci Data 5, 180178 (2018). doi:10.1038/sdata.2018.178.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/sdata.2018.178&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=30204154&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2026%2F04%2F24%2F2026.04.23.26351588.atom) 

12. [12].Olatunji G., Kokori E., Aderinto N., et al. Challenges and Strategies in Pediatric Critical Care: Insights From Low-Resource Settings. Global Pediatric Health. 2024;11. doi:10.1177/2333794X241285964.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1177/2333794X241285964&link_type=DOI) 

13. [13].Declerck J., Lee J., Sen A., et al. The Potential to Leverage Real-World Data for Pediatric Clinical Trials: A Proof-of-Concept Study. J Med Internet Res 2025;27:e72573. DOI: 10.2196/72573
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.2196/72573&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=40446289&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2026%2F04%2F24%2F2026.04.23.26351588.atom) 

14. [14]. El Emam K., Rodgers S., Malin B. Anonymising and sharing individual patient data. BMJ (Clinical Research ed.). 2015 Mar;350:h1139. DOI: 10.1136/bmj.h1139. PMID: 25794882; PMCID: PMC4707567.
    
    [FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiRlVMTCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiYm1qIjtzOjU6InJlc2lkIjtzOjE3OiIzNTAvbWFyMjBfMS9oMTEzOSI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDI2LzA0LzI0LzIwMjYuMDQuMjMuMjYzNTE1ODguYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 

15. [15].Devlin J., Chang M., Lee K., Toutanova K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. North American Chapter of the Association for Computational Linguistics.
    
    
16. [16].Brown TB., Mann B., Ryder N, et al. 2020. Language models are few-shot learners. In Proceedings of the 34th International Conference on Neural Information Processing Systems (NIPS ’20). Curran Associates Inc., Red Hook, NY, USA, Article 159, 1877–1901.
    
    
17. [17].Bommasani, Rishi et al. “On the Opportunities and Risks of Foundation Models.” ArXiv abs/2108.07258 (2021): n. Pag
    
    
18. [18].Liu Z., Alavi A., Li M., Zhang X. Self-Supervised Contrastive Learning for Medical Time Series: A Systematic Review. Sensors. 2023; 23(9):4221. doi:10.3390/s23094221
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.3390/s23094221&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=37177423&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2026%2F04%2F24%2F2026.04.23.26351588.atom) 

19. [19].von Elm E., Altman DG., Egger M., Pocock SJ., Gøtzsche PC., Vandenbroucke JP. STROBE Initiative. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. Lancet. 2007 Oct 20;370(9596):1453–7. doi: 10.1016/S0140-6736(07)61602-X. PMID: 18064739.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S0140-6736(07)61602-X&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=18064739&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2026%2F04%2F24%2F2026.04.23.26351588.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000250386000022&link_type=ISI) 

20. [20].Benchimol EI., Smeeth L., Guttmann A., Harron K., Moher D, Petersen I, Sørensen HT, von Elm E, Langan SM; RECORD Working Committee. The REporting of studies Conducted using Observational Routinely-collected health Data (RECORD) statement. PLoS Med. 2015 Oct 6;12(10):e1001885. doi: 10.1371/journal.pmed.1001885. PMID: 26440803; PMCID: PMC4595218.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pmed.1001885&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26440803&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2026%2F04%2F24%2F2026.04.23.26351588.atom) 

21. [21].Atchinson BK., Fox DM. The politics of the Health Insurance Portability and Accountability Act. Health Aff (Millwood). 1997 May-Jun;16(3):146-50. doi: 10.1377/hlthaff.16.3.146. PMID: 9141331.
    
    [FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6MzoiUERGIjtzOjExOiJqb3VybmFsQ29kZSI7czo5OiJoZWFsdGhhZmYiO3M6NToicmVzaWQiO3M6ODoiMTYvMy8xNDYiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyNi8wNC8yNC8yMDI2LjA0LjIzLjI2MzUxNTg4LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 

22. [22].Koch E, Lovett S, Nghiem T, Riggs RA, Rech MA. Shock index in the emergency department: utility and limitations. Open Access Emerg Med OAEM. 2019;11:179–99.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.2147/OAEM.S178358&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=31616192&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2026%2F04%2F24%2F2026.04.23.26351588.atom) 

23. [23].Cheng TH, Sie YD, Hsu KH, Goh ZNL, Chien CY, Chen HY, et al. Shock Index: A Simple and Effective Clinical Adjunct in Predicting 60-Day Mortality in Advanced Cancer Patients at the Emergency Department. Int J Environ Res Public Health. 2020 Jul;17(13):4904.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.3390/ijerph17134904&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32646021&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2026%2F04%2F24%2F2026.04.23.26351588.atom) 

24. [24].Birkhahn RH, Gaeta TJ, Tloczkowski J, Terry D, Bove JJ. The shock index in early acute hypovolemia. Acad Emerg Med. 2003;10(5):494–5.
    
    
25. [25].Večurkovská, I., Roškovičová, V. & Kaťuchová, J. Challenges in the Clinical Application of Machine Learning for Pancreatic Cancer. Bratisl. Med. J. 126, 2437–2450 (2025). doi:10.1007/s44411-025-00312-4.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/s44411-025-00312-4&link_type=DOI) 

26. [26].Stein, Dan Fredman, et al. Predicting Cardiovascular deterioration in a paediatric intensive care unit (PicEWS): a machine learning modelling study of routinely collected health-care data. eClinicalMedicine, Volume 85, 103255.
    
    
27. [27].Zech JR, Badgeley MA, Liu M, Costa AB, Titano JJ, et al. (2018) Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study. PLOS Medicine 15(11): e1002683. doi:10.1371/journal.pmed.1002683
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pmed.1002683&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2026%2F04%2F24%2F2026.04.23.26351588.atom)