Automated development of clinical prediction models enables real-time risk stratification with exemplar application to hypoxic-ischaemic encephalopathy
=======================================================================================================================================================

* Matthew S. Lyon
* Heather White
* Tom R. Gaunt
* Deborah Lawlor
* David Odd

## Abstract

Real-time updated risk prediction of disease outcomes could lead to improvements in patient care and better resource management. Established monitoring during pregnancy at antenatal and intrapartum periods could be particularly amenable to benefits of this approach. This proof-of-concept study compared automated and manual prediction modelling approaches using data from the Collaborative Perinatal Project with exemplar application to hypoxic-ischaemic encephalopathy (HIE). Using manually selected predictors identified from previously published studies we obtained high HIE discrimination with logistic regression applied to antenatal only (0.71 AUC [95% CI 0.64-0.77]), antenatal and intrapartum (0.70 AUC [95% CI 0.64-0.77]), and antenatal, intrapartum and birthweight (0.73 AUC [95% CI 0.67-0.79]) data. In parallel, we applied a range of automated modelling methods and found penalised logistic regression had best discrimination and was equivalent to the manual approach but required little human input giving 0.75 AUC for antenatal only (95% CI 0.69, 0.81), 0.70 AUC for antenatal and intrapartum (95% CI 0.63, 0.78), and 0.74 AUC using antenatal, intrapartum, and infant birthweight (95% CI 0.65, 0.81). These results demonstrate the feasibility of developing automated prediction models which could be applied to produce disease risk estimates in real-time. This approach may be especially useful in pregnancy care but could be applied to any disease.

## Introduction

In healthcare, prediction is used to stratify individuals into different groups of risk to optimise prevention and treatment of disease1. This approach has been successfully applied for decades to identify individuals who will benefit from statin medication, reducing population levels of cardiovascular disease2. Prediction modelling involves combining several variables in a statistical model (for example a multivariable regression model) to estimate the occurrence of a future outcome1. Clinical prediction models should be accurate3 (i.e., discriminate between those who go on to get the disease and those who remain healthy, and calibrated3 such that the proportion predicted to get disease is similar to that subsequently observed), feasible (i.e., using variables that are relatively easy to obtain) and cost-efficient. New prediction models should be compared to those already used in terms of accuracy, feasibility, and cost.

Deciding which variables to include in a prediction model is one of the most important and difficult steps in the model development process4. Traditionally, prediction models have been based on established causal risk factors, for example cardiovascular risk prediction tools in common use internationally, such as the pooled cohort equation5 and Q-risk6, include established cardiovascular risk factors such as cigarette smoking, systolic blood pressure and diabetes. However, identifying risk factors is time consuming and laborious and may be subject to selection biases if these variables were preselected using prior knowledge. This process may not be optimal or practical when applied to large datasets and must be repeated with every new dataset. Furthermore, it is increasingly recognised that whilst some risk factors are good predictors, disease predictors do not need to be (causal) risk factors.

The availability of large multidimensional data such as multi ‘omics and electronic health records (EHRs) has enabled the potential to take a hypothesis-free approach to prediction which may identify more variables that could improve prediction accuracy. Automated feature selection methods are often used in these studies7–9 to reduce the potential number of variables that remain in the model. These are needed as the scale of the data (several 100s or 1000s of variables) would be beyond conventional approaches comparing different multivariable regression models with a smaller number of variables selected on the basis of ‘independence’ based on a p-value and/or regression coefficient threshold. Several popular feature selection approaches exist including penalised regression10,11, decision tree ensembles12 and recursive feature elimination (RFE)13, among others.

With the increased use of EHRs and widespread availability of computing resources in primary and secondary care it should be feasible to implement dynamic prediction models3. These are tools that enable ‘real-time-updated’ risk stratification for different health outcomes that refresh on each clinic appointment, or as new data are collected so treatment is tailored to that new risk. For example, a recent study demonstrated a pipeline for use with structured EHRs and its application to predicting risk at regular time intervals, such as every 6-hours in patients with acute hospital admissions14. In parallel with this, new data from all patients could be used to regularly update the underlying prediction models for different health outcomes. Such ideas would require major changes to the understanding of prediction and risk stratification by the public and healthcare providers, together with major operational changes to healthcare systems and provision. Determining the extent to which automated prediction modelling applied to EHRs could improve prediction of health outcomes over and above established risk factors would be useful for knowing whether the aim of ‘real-time updated prediction’ for ‘personalised’ healthcare is feasible, and if so, for which conditions.

Real-time updated risk stratification could be particularly beneficial in pregnancy care as data are collected at regular intervals during antenatal and intrapartum (i.e., around the time of labour and birth) periods. However, unlike application to acute hospital admissions, pregnancy data are collected by a wide range of healthcare providers leading to rich but highly unstructured data that require robust prediction approaches.

A significant cause of perinatal brain injury is perinatal asphyxia, leading to hypoxic-ischaemic encephalopathy (HIE) affecting 2-6 per 1000 live births15. HIE is often devastating, with life-long impacts for the infant16 and their family, as well as costing society millions of pounds in medical compensation, lost earningsearnings, fare support17. While preventative interventions such as induction of labour or operative delivery can be employed if the risks of continuing the pregnancy are higher than those of early delivery18 there is a lack of clear data on when to intervene.

Badawi *et al*19,20 identified thirty-five potential (independent) antenatal and intrapartum risk factors for HIE. As these factors were identified individually (using univariate analyses) in a case-control study, the predictive accuracy of models combining these risk factors were not explored in the original studies and to the best of our knowledge predictive accuracy of these factors for HIE has not been explored in publications since then.

The aim of this study is to explore the feasibility of undertaking real-time updated risk stratification in pregnancy using a large linked antenatal and intrapartum dataset applied to prediction of HIE, as an exemplar of the potential of health record clinic data to improve risk prediction. To do this we used data from the large Collaborative Perinatal Project and applied automated feature selection approaches to prepare and evaluate prediction models using antenatal only and antenatal and intrapartum variables. These models were compared with the 35 potential risk factors manually identified by Badawi *et al*.

## Methods

### Participants

The Collaborative Perinatal Project (CPP) is a prospective cohort study of approximately 60,000 pregnancies, and 58,000 live born infants born between 1959 and 1965 throughout the prenatal period, labour, and delivery, postpartum and as the child grew21. Participants were removed from analyses if the pregnancy was preterm (< 37 weeks), post-term (> 42 weeks), or of young mothers (< 16 years). CPP data files are freely available for download from the National Archives Catalog ([https://www.archives.gov/research/electronic-records/nih.html](https://www.archives.gov/research/electronic-records/nih.html)). Ethical approval was not required for this study as the data were publicly available for unrestricted use as detailed in this note ([https://catalog.archives.gov/OpaAPI/media/643929/content/arcmedia/electronic-records/rg-443/CPP/304.1-3ND.pdf](https://catalog.archives.gov/OpaAPI/media/643929/content/arcmedia/electronic-records/rg-443/CPP/304.1-3ND.pdf)).

### Outcome

HIE was defined as having definite seizures, hypertonia, jitteriness, hypotonia, abnormal reflexes, or abnormal cry (provided in the CPP dataset without guidance on clinical criteria used); after having a low 5 minute Apgar score (<7)22. Data were collected by independent observers onto dedicated forms independently of the clinical team.

### Defining training and testing datasets

Pregnancies were ordered chronologically and split into two equal subsets for training (infants born 1959-1962) and testing purposes (infants born 1963 to 1965). All variables were categorised as either antenatal (measurable before 37 weeks’ gestation), intrapartum (measures only available at or after 37 weeks, up to the point of delivery), and infant birthweight, and classified as either unordered categorical, ordinal, or numerical. The earlier pregnancies were used for training to emulate the use of existing records in predicting future events.

### Feature preparation

Variables were excluded if they contained >10% of missing data values (n=28/518) in the whole dataset leaving a potential n=490 exposure data fields for the prediction models. Differences in missingness between training and testing datasets were generally low except for private health insurance which was 13.5% missing in the training dataset and 0% in the testing cohort (**Supplementary Figure 1**). Unordered categorical variables were recoded as dummy variables producing n=2126 additional fields (**Figure 2**). Variables with zero variance were removed from analysis (n=189), the remaining features were split into antenatal (n=1638), antenatal and intrapartum (n=2122), and antenatal, intrapartum and birthweight (n=2427). All features were standardised to have zero mean and unit variance.

![Supplementary Figure 1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/04/01/2022.04.01.22273313/F4.medium.gif)

[Supplementary Figure 1.](http://medrxiv.org/content/early/2022/04/01/2022.04.01.22273313/F4)

Supplementary Figure 1. Variable missingness in testing and training datasets
Proportion of variable missingness in testing and training datasets. Variables with an absolute difference in missingness greater than 10% were labelled. Private_0304, private health insurance.

Broadly, two approaches were used to select features for developing HIE prediction models (**Figure 2**). First, a manually defined set were prepared by taking 20-35 potential risk factors for HIE (**Supplementary Table 1**) identified in Badawi et al19,20. Second, automated feature selection approaches applied to training dataset were used to rank features in terms of predictive performance to select the top n=20, n=40 or n=60 predictors for each selection method. The feature selection methods used were logistic regression with RFE, Elastic net and least absolute shrinkage and selection operator (LASSO) regression, random forest classifier, and linear support vector classifier with LASSO regularisation (SVC), as outlined below and in Supplementary Table 2.

View this table:
[Supplementary Table 1.](http://medrxiv.org/content/early/2022/04/01/2022.04.01.22273313/T2)

Supplementary Table 1. Established risk factors for HIE

View this table:
[Supplementary Table 2.](http://medrxiv.org/content/early/2022/04/01/2022.04.01.22273313/T3)

Supplementary Table 2. Feature selection methods

#### Reverse feature elimination13 with cross-validation

During the first iteration all input variables were included as predictors in a logistic regression model trained using five-fold cross validation. On each subsequent iteration the five weakest predictors (determined by the smallest absolute coefficient) were eliminated, with iterations continuing until only one predictor remained. This approach uses the effect of each predictor on the cross-validation mean AUC to rank features. The presented rankings were inversed so that the metric was positively correlated with feature predictability and to allow between-method comparisons.

#### LASSO10 and Elastic net11 regression

Logistic regression with L1 (shrinks weak coefficients to zero, LASSO) or L1 & L2 (combined shrinking and removal of weak coefficients, Elastic net) regularisation penalties was trained using five-fold cross-validation to determine the optimal value of alpha. The Elastic net mixing parameter (l1_ratio, representing the ratio of L1 [shrinks weak coefficients to zero] to L2 [removes weak coefficients] penalties) was set to 0.5. The penalty term shrinks weak predictors to zero which were subsequently eliminated from downstream analyses. The remaining features were ranked by their absolute regression coefficient (smallest is least predictive).

#### Random forest classifier12

The random forest classifier was trained using five-fold cross-validation and default parameters to estimate the feature importance metric which was used to rank features (smallest is least predictive).

#### Penalised linear support vector classifier23

The linear SVC was trained with five-fold cross-validation using the AUC metric. The default L1 penalty term (C=1.0) was applied to shrink weak coefficients to zero. The coefficients were taken as a measure of feature importance (absolute value; smallest is least predictive).

### Binary classification

The primary classification analysis was performed using logistic regression models trained with training data and applied to estimate HIE probability using the testing data. Follow on analyses compared logistic regression with a range of other classifiers using default hyperparameters: random forest, naïve Bayes, and support vector machine (radial basis function kernel) and neural network (one hidden layer with number of nodes equal to number of predictors using the rectified linear activation function and Adam optimiser24). Model discrimination was compared using area under the receiver-operator curves (ROC) with 95% confidence interval.

### Software

Data were pre-processed using Stata (v13) and Python (v3.8) with pandas ([https://pandas.pydata.org](https://pandas.pydata.org); v1.3.5) and numpy25 (v1.21.5) packages. Prediction models were developed using Stata, scikit-learn26 (v1.0.2) and Tensorflow (v1.15.0; [www.tensorflow.org](http://www.tensorflow.org)). R (v4.0.2) was used to estimate AUROC and 95% confidence interval with pROC package27 (v1.16.2) and prepare plots with ggplot228 (3.3.3) and GGally ([https://github.com/ggobi/ggally](https://github.com/ggobi/ggally); v2.0.0) packages.

## Results

### Participants

The dataset was based on the full CPP variable file dataset, containing data on 58,760 infants (**Figure 1**). A total of 12,005 infants were born preterm (<37 weeks of completed gestation), 5476 were born after 42 weeks, and 964 were born to a mother of younger than 16 years age; leaving a total of 40,315 for the analyses. 19,487 infants were born between 1959 and 1962 (and were placed in the first cohort), while 20,828 were born between 1963 and 1966 (and were placed in the second).

![Figure 1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/04/01/2022.04.01.22273313/F1.medium.gif)

[Figure 1.](http://medrxiv.org/content/early/2022/04/01/2022.04.01.22273313/F1)

Figure 1. Participant inclusion flowchart

Table 1 shows characteristics of mothers and infants by HIE status. Overall, 209 (0.5%) had evidence of HIE, 549 (1.4%) died in the perinatal period, 1228 (3.1%) had a low Apgar score at 5 minutes and 2013 (5.1%) required resuscitation after birth. Measures varied by many of the demographic measures recorded, with infants with HIE more likely to have older mothers (e.g., they were less likely to be employed [11.2% vs 15.5%]), have private health insurance (2.7% vs 7.0%) or be of white ethnicity (30.1% vs 49.9%). Mothers were also more likely to have antenatal complications in their pregnancy (e.g., more preeclampsia (9.1% vs 3.3%) and pre-labour bleeding [33.8% vs 28.1%]). Infants with HIE also differed in their birth characteristics (e.g., breech presentation 15.3% vs 2.7%), their sex (62.1% vs 50.6% male) and were smaller.

View this table:
[Table 1.](http://medrxiv.org/content/early/2022/04/01/2022.04.01.22273313/T1)

Table 1. Demographics of study population split by HIE

### Automated feature preparation

Of the 518 variables provided in the CPP dataset (**Figure 2**), 28 (5%) were removed due to high missingness (>10%). The variables were split by type into ordinal (n=26), continuous (n=27) and unordered categorical (n=437). Unordered categorical variables were split into n=2563 dummy variables. The combined set were filtered to remove features with zero variance leaving n=2427 features which were split by collection point during pregnancy into antenatal (n=1638), antenatal and intrapartum (n=2122) and antenatal, intrapartum and birthweight (n=2427).

![Figure 2.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/04/01/2022.04.01.22273313/F2.medium.gif)

[Figure 2.](http://medrxiv.org/content/early/2022/04/01/2022.04.01.22273313/F2)

Figure 2. Feature engineering and selection workflow
LASSO, least absolute shrinkage and selection operator. RFE, reverse feature elimination. SVC, penalised linear support vector classification.

Feature rankings for Elastic net, LASSO, penalised linear SVC, and random forest (Spearman’s correlation coefficient 0.23 to 0.71) were most highly correlated, while RFE and LASSO/Elastic net were less well correlated (Spearman’s correlation coefficient 0.17 to 0.34; **Supplementary Figures 2-4**).

![Supplementary Figure 2.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/04/01/2022.04.01.22273313/F5.medium.gif)

[Supplementary Figure 2.](http://medrxiv.org/content/early/2022/04/01/2022.04.01.22273313/F5)

Supplementary Figure 2. Distribution and correlation of automated feature selection scores using antenatal features
L1 penalised regression (Elastic net & LASSO), absolute regression coefficient. Random forest classifier (Tree), feature importance. L1 penalised linear support vector classifier (SVC), absolute coefficient. Inverse reverse feature elimination (RFE) ranking using logistic regression. Corr, Spearman’s rho rank correlation coefficient.

![Supplementary Figure 3.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/04/01/2022.04.01.22273313/F6.medium.gif)

[Supplementary Figure 3.](http://medrxiv.org/content/early/2022/04/01/2022.04.01.22273313/F6)

Supplementary Figure 3. Distribution and correlation of automated feature selection scores using antenatal and intrapartum feature set
L1 penalised regression (Elastic net & LASSO), absolute regression coefficient. Random forest classifier (Tree), feature importance. L1 penalised linear support vector classifier (SVC), absolute coefficient. Inverse reverse feature elimination (RFE) ranking using logistic regression. Corr, Spearman’s rho rank correlation coefficient.

![Supplementary Figure 4.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/04/01/2022.04.01.22273313/F7.medium.gif)

[Supplementary Figure 4.](http://medrxiv.org/content/early/2022/04/01/2022.04.01.22273313/F7)

Supplementary Figure 4. Distribution and correlation of automated feature selection scores using antenatal, intrapartum and birthweight feature set
L1 penalised regression (Elastic net & LASSO), absolute regression coefficient. Random forest classifier (Tree), feature importance. L1 penalised linear support vector classifier (SVC), absolute coefficient. Inverse reverse feature elimination (RFE) ranking using logistic regression. Corr, Spearman’s rho rank correlation coefficient.

### Model performance

The discriminatory ability of each feature set was estimated using the AUC on the second (later) half of pregnancies with logistic regression (**Figure 3**).

![Figure 3.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/04/01/2022.04.01.22273313/F3.medium.gif)

[Figure 3.](http://medrxiv.org/content/early/2022/04/01/2022.04.01.22273313/F3)

Figure 3. HIE prediction using logistic regression with a range of feature selection approaches
Area under ROC for prediction of hypoxic-ischaemic encephalopathy using logistic regression and a range of feature selection approaches. The model was trained using the first 50% infants (born 1959-1962) and evaluated using the latter 50% of infants (born 1963 to 1965). AUROC, area under the receiver operator curve. CI, confidence interval. LASSO, logistic regression least absolute shrinkage and selection operator. RFE, reverse feature elimination using logistic regression. SVC, penalised linear support vector classification.

Established predictors of HIE gave equally good discrimination using features collected at each of the pregnancy time points. For example, antenatal features gave an AUC of 0.71 (95% CI 0.64-0.77; n=20 predictors), antenatal and intrapartum measures had an AUC 0.70 (95% CI 0.64-0.77; n=34 predictors), and antenatal, intrapartum, and infant birthweight was 0.73 AUC (95% CI 0.67-0.79; n=35 predictors).

The automated feature selection approach was applied to select the top n=20, n=40 or n=60 features using antenatal only, antenatal and intrapartum, and antenatal, intrapartum, and infant birthweight measures and modelled using logistic regression for comparison with established predictors (**Figure 3**). Broadly, there was no strong difference in discrimination between the clinically defined and Elastic net, LASSO, penalised linear SVC, and random forest feature selection approaches. RFE performed worse than the clinically defined features at antenatal and antenatal and intrapartum timepoints. Overall, there was a trend towards better discrimination with a larger number of features included in the model. However, the models derived using features identified by Badawi et al gave better performance than the automated approaches with similar numbers of predictors. Adding intrapartum and birthweight variables had no strong impact on discrimination.

There was no strong improvement in discrimination using a range of classifiers in comparison with logistic regression (**Supplementary Figure 5**).

![Supplementary Figure 5.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/04/01/2022.04.01.22273313/F8.medium.gif)

[Supplementary Figure 5.](http://medrxiv.org/content/early/2022/04/01/2022.04.01.22273313/F8)

Supplementary Figure 5. Discrimination of HIE with a range of classifiers
Sixty features selected using logistic Elastic net from the antenatal, intrapartum and birthweight dataset. AUROC, area under the receiver-operator curve. CI, confidence interval. SVC, support vector machine. RBF, radial basis function.

## Discussion

Here we demonstrate the feasibility of developing automated and low-cost prediction models with minimal human input as a proof of concept for providing real-time updated patient risk stratification which could be applied to any disease outcome. As an exemplar this approach was applied to classify HIE using variables collected at different timepoints throughout pregnancy and benchmarked with manually prepared models.

Automated development of logistic regression models using features ranked by LASSO gave best performance of all automated feature selection methods. These were 0.75 AUC (95% CI 0.69, 0.81) using antenatal only, 0.70 AUC (95% CI 0.63, 0.78) for antenatal and intrapartum and 0.73 AUC (95% CI 0.65, 0.81) for antenatal, intrapartum, and infant birthweight which were not substantially different from logistic regression models developed using features selected manually by Badawi *et al*. Meanwhile, feature ranking with RFE produced models with lower discriminative capacity and this approach may not be suitable for providing real-time updated risk estimation. Our results demonstrate the utility of automated feature selection approaches in developing accurate risk prediction models. This process could be optimised to produce models with little or no human input and updated in real-time as new data are made available. This is in contrast with the manual approach taken by Badawi et al which is time-consuming and laborious, requires expert knowledge and would also need to be repeated for any new dataset, or if risk factors change over time.

Previous research using automated methods for HIE model development with EHRs collected before delivery obtained AUCs of 0.87 (95% CI 0.86, 0.88) with logistic Elastic net29, AUC of 0.78 (95% CI 0.77, 0.79) using deep learning30 and 0.72 AUC (95% CI 0.71, 0.73) with logistic ridge regression30. While the latter results are in line with findings of this study, data were collected over different time periods and cannot be compared easily.

Our strategy for model development shared similarities with a recent published protocol for estimating risk of acute kidney injury using EHRs14, including data quality control, defined participant inclusion/exclusion criteria, separate training and test data to avoid overfitting, feature engineering such as creating dummy variables from unordered categorical variables, use of model performance metrics that balance sensitivity with specificity and comparison of multiple model architectures. Assessment of calibration was not suitable for this data given its age. The CPP data lacked sufficient granularity for effective application of recurrent neural network designs which make use of temporal relationships between features14 and are ideal for real-time updated risk stratification using data collected throughout the patient pathway. Instead, our models were trained using data equivalent to measures obtained during clinic assessment but do demonstrate feasibility with this different type of data.

Although our results do not support a significant improvement in HIE discrimination following the addition of intrapartum or birthweight variables, our findings illustrate the possibility of updating risk estimation using data collected at different time points throughout patient care as has been done in real-time for acute kidney injury demonstrating the potential for this approach14. A key concern with automated model development is the potential for classifiers to learn from physician behaviour and beliefs rather than provide new diagnostic insight of disease process31. Predictions based on physicians’ behaviour are unlikely to provide improvements in individual-level care beyond the average action taken for an average patient31. These challenges also apply in perinatal prediction, for example women with risk factors for stillbirth are delivered earlier and therefore prediction of stillbirth after 37 weeks will be ineffective32. To ensure models are unaffected by these biases, extensive external validation in different clinical environments where policies and procedures differ is needed.

While automated and manual feature selection processes gave equivalent classification performance, the automated approaches required up to three-times more features (n=60 vs n=20-35) suggesting some redundancy that clinical insight can circumvent. Simpler models are more desirable because fewer variables are required to obtain a risk estimate and have wider utility in the presence of missing data. Overfitting is also a concern when the number of features is large compared with the sample size which can produce inflated coefficients and imprecise standard errors resulting in poor generalisability and model instability33. Provided appropriate external validation and quality control procedures are in place these limitations should not be a barrier to clinical utility.

Using top Elastic net feature rankings, we explored a range of classifiers capable of modelling non-linear relationships and interactions but found no strong improvement in classification performance compared with logistic regression. These results were anticipated as the data were mostly categorical and linearity assumptions were justified. Conditioning on linearity and residual homogeneity through feature selection with Elastic net could also explain a lack of improvement. These findings are consistent with other studies34,35 and highlight that logistic regression is well suited for automated risk estimation using this type of data. Regression models are also desirable for providing well-calibrated risk estimates that are easy to interpret and already familiar with clinicians in contrast with non-parametric methods.

We used a dataset collected in the 1950s and our specific findings on HIE prediction may not translate into current clinical practise. Studies of real-time updated risk stratification applied to pregnancy outcomes in contemporary data are needed. Although the dataset was large, the number of HIE cases was small creating class imbalance issues that are known to adversely affect model performance36. We explored subsampling of the training dataset to address this issue but found no improvement in discrimination (not shown).

This study explored the concept of real-time updated risk stratification applied to disease prediction with exemplar application to HIE using data collected through pregnancy. While we find no substantial improvement in prediction using intrapartum or birthweight in combination with antenatal factors, this study does demonstrate the feasibility of such approaches. Future studies are needed to evaluate real-time updated risk stratification of other outcomes.

## Data Availability

Stata, R and Python3 code to replicate these analyses are available from https://github.com/MRCIEU/hie-ml. The CPP data files and documentation are available for download from the National Archives Catalog (https://www.archives.gov/research/electronic-records/nih.html)

[https://github.com/MRCIEU/hie-ml](https://github.com/MRCIEU/hie-ml) 

## Data and open-source code availability

Stata, R and Python3 code to replicate these analyses are available from [https://github.com/MRCIEU/hie-ml](https://github.com/MRCIEU/hie-ml).

The CPP data files and documentation are available for download from the National Archives Catalog ([https://www.archives.gov/research/electronic-records/nih.html](https://www.archives.gov/research/electronic-records/nih.html))

## Funding

This study was funded by the NIHR Biomedical Research Centre at University Hospitals Bristol and Weston NHS Foundation Trust and the University of Bristol, the UK Medical Research Council as part of the MRC Integrative Epidemiology Unit (MC\_UU_00011/4 and MC_UU_00011/6). The funders had no role in the study design, collection of data or interpretation of results. The views expressed are those of the author(s) and not necessarily those of the NIHR, the Department of Health and Social Care or Medical Research Council.

## Competing interest

T.R.G receives funding from Biogen for unrelated research. D.A.L has received support from Medtronic Ltd and Roche Diagnostics for unrelated research.

*   Received April 1, 2022.
*   Revision received April 1, 2022.
*   Accepted April 1, 2022.


*   © 2022, Posted by Cold Spring Harbor Laboratory

This pre-print is available under a Creative Commons License (Attribution 4.0 International), CC BY 4.0, as described at [http://creativecommons.org/licenses/by/4.0/](http://creativecommons.org/licenses/by/4.0/)

## References

1.  1.Collins, G. S., Reitsma, J. B., Altman, D. G. & Moons, K. G. M. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): The TRIPOD Statement. BMC Med. 13, 1–10 (2015).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/s12916-014-0241-z&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25563062&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F04%2F01%2F2022.04.01.22273313.atom) 

2.  2.Stewart, J., Manmathan, G. & Wilkinson, P. Primary prevention of cardiovascular disease: A review of contemporary guidance and literature. JRSM Cardiovasc. Dis. 6, 204800401668721 (2017).
    
    
3.  3.Loftus ID, T. J. et al. Ideal algorithms in healthcare: Explainable, dynamic, precise, autonomous, fair, and reproducible. PLOS Digit. Heal. 1, e0000006 (2022).
    
    
4.  4.Chowdhury, M. Z. I. & Turin, T. C. Variable selection strategies and its importance in clinical prediction modelling. Fam. Med. Community Heal. 8, 262 (2020).
    
    
5.  5.Goff, D. C. et al. 2013 ACC/AHA guideline on the assessment of cardiovascular risk: A report of the American college of cardiology/American heart association task force on practice guidelines. Circulation. 129 S49–S73 (2014).
    
    [FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiRlVMTCI7czoxMToiam91cm5hbENvZGUiO3M6MTQ6ImNpcmN1bGF0aW9uYWhhIjtzOjU6InJlc2lkIjtzOjE4OiIxMjkvMjVfc3VwcGxfMi9TNDkiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMi8wNC8wMS8yMDIyLjA0LjAxLjIyMjczMzEzLmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 

6.  6.Hippisley-Cox, J., Coupland, C. & Brindle, P. Development and validation of QRISK3 risk prediction algorithms to estimate future risk of cardiovascular disease: Prospective cohort study. BMJ 357, (2017).
    
    
7.  7.McBride, N. et al. Do nuclear magnetic resonance (NMR)-based metabolomics improve the prediction of pregnancy-related disorders? Findings from a UK birth cohort with independent validation. BMC Med. 18, 366 (2020).
    
    
8.  8.McBride, N. et al. Do Mass Spectrometry-Derived Metabolomics Improve the Prediction of Pregnancy-Related Disorders? Findings from a UK Birth Cohort with Independent Validation. Metabolites 11, 530 (2021).
    
    
9.  9.Sun, L. et al. Polygenic risk scores in cardiovascular risk prediction: A cohort study and modelling analyses. PLOS Med. 18, e1003498 (2021).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pmed.1003498&link_type=DOI) 

10. 10.Tibshirani, R. Regression Shrinkage and Selection Via the Lasso. J. R. Stat. Soc. Ser. B 58, 267–288 (1996).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.2307/2346178&link_type=DOI) 

11. 11.Zou, H. & Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B Stat. Methodol. 67, 301–320 (2005).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/j.1467-9868.2005.00503.x&link_type=DOI) 

12. 12.Breiman, L. Random Forests. Mach. Learn. 45, 5–32 (2001).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1023/A:1010933404324&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=28752533&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F04%2F01%2F2022.04.01.22273313.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000170489900001&link_type=ISI) 

13. 13.Guyon, I., Weston, J., Barnhill, S. & Vapnik, V. Gene Selection for Cancer Classification using Support Vector Machines. Mach. Learn. 46, 389–422 (2002).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1023/A:1012487302797&link_type=DOI) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000171501800018&link_type=ISI) 

14. 14.Tomašev, N. et al. Use of deep learning to develop continuous-risk models for adverse event prediction from electronic health records. Nat. Protoc. 16, 2765–2787 (2021).
    
    
15. 15.Kurinczuk, J. J., White-Koning, M. & Badawi, N. Epidemiology of neonatal encephalopathy and hypoxic–ischaemic encephalopathy. Early Hum. Dev. 86, 329–338 (2010).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.earlhumdev.2010.05.010&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20554402&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F04%2F01%2F2022.04.01.22273313.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000280452000002&link_type=ISI) 

16. 16.Azzopardi, D. V. et al. Moderate Hypothermia to Treat Perinatal Asphyxial Encephalopathy. N. Engl. J. Med. 361, 1349–1358 (2009).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1056/NEJMoa0900854&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=19797281&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F04%2F01%2F2022.04.01.22273313.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000270300000009&link_type=ISI) 

17. 17.Odd, D. E., Gunnell, D., Lewis, G. & Rasmussen, F. Long-term impact of poor birth condition on social and economic outcomes in early adulthood. Pediatrics 127, e1498–e1504 (2011).
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MTA6InBlZGlhdHJpY3MiO3M6NToicmVzaWQiO3M6MTE6IjEyNy82L2UxNDk4IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjIvMDQvMDEvMjAyMi4wNC4wMS4yMjI3MzMxMy5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 

18. 18.Molina, G. et al. Relationship between cesarean delivery rate and maternal and neonatal mortality. JAMA - J. Am. Med. Assoc. 314, 2263–2270 (2015).
    
    
19. 19.Badawi, N. et al. Intrapartum risk factors for newborn encephalopathy: the Western Australian case-control study. BMJ 317, 1554–1558 (1998).
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiYm1qIjtzOjU6InJlc2lkIjtzOjEzOiIzMTcvNzE3Mi8xNTU0IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjIvMDQvMDEvMjAyMi4wNC4wMS4yMjI3MzMxMy5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 

20. 20.Badawi, N. et al. Antepartum risk factors for newborn encephalopathy: the Western Australian case-control study. BMJ 317, 1549–1553 (1998).
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiYm1qIjtzOjU6InJlc2lkIjtzOjEzOiIzMTcvNzE3Mi8xNTQ5IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjIvMDQvMDEvMjAyMi4wNC4wMS4yMjI3MzMxMy5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 

21. 21.Taylor, P. M. The First Year of Life: The Collaborative Perinatal Project of the National Institute of Neurological and Communicative Disorders and Stroke. JAMA 244, 1503–1503 (1980).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1001/jama.1980.03310130075046&link_type=DOI) 

22. 22.Odd, D. E., Lewis, G., Whitelaw, A. & Gunnell, D. Resuscitation at birth and cognition at 8 years of age: a cohort study. Lancet 373, 1615–1622 (2009).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S0140-6736(09)60244-0&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=19386357&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F04%2F01%2F2022.04.01.22273313.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000266066300032&link_type=ISI) 

23. 23.Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R. & Lin, C.-J. LIBLINEAR: A Library for Large Linear Classification. J. Mach. Learn. Res. 9, 1871–1874 (2008).
    
    
24. 24.Kingma, D. P. & Ba, J. L. Adam: A Method for Stochastic Optimization. Int. Conf. Learn. Represent. (2014) doi:10.48550/arxiv.1412.6980.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.48550/arxiv.1412.6980&link_type=DOI) 

25. 25.Harris, C. R. et al. Array programming with NumPy. Nature 585, 357–362 (2020).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41586-020-2649-2&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32939066&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F04%2F01%2F2022.04.01.22273313.atom) 

26. 26.Pedregosa, F. et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.cpc.2010.04.018&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23755062&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F04%2F01%2F2022.04.01.22273313.atom) 

27. 27.Robin, X. et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics 12, 77 (2011).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/1471-2105-12-77&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21414208&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F04%2F01%2F2022.04.01.22273313.atom) 

28. 28.Wickham, H. ggplot2: Elegant Graphics for Data Analysis. (Springer-Verlag New York, 2016).
    
    
29. 29.Li, T. et al. Predicting Neonatal Encephalopathy From Maternal Data in Electronic Medical Records. AMIA Summits Transl. Sci. Proc. 2018, 359 (2018).
    
    
30. 30.Gao, C., Yan, C., Osmundson, S., Malin, B. A. & Chen, Y. A deep learning approach to predict neonatal encephalopathy from electronic health records. 2019 IEEE Int. Conf. Healthc. Informatics 1–7 (2019) doi:10.1109/ICHI.2019.8904667.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1109/ICHI.2019.8904667&link_type=DOI) 

31. 31.Beaulieu-Jones, B. K. et al. Machine learning for patient risk stratification: standing on, or looking over, the shoulders of clinicians? npj Digit. Med. 4, (2021).
    
    
32. 32.Townsend, R. et al. Can risk prediction models help us individualise stillbirth prevention? A systematic review and critical appraisal of published risk models. BJOG An Int. J. Obstet. Gynaecol. 128, 214–224 (2021).
    
    
33. 33.Stoltzfus, J. C. Logistic Regression: A Brief Primer. Acad. Emerg. Med. 18, 1099–1104 (2011).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/j.1553-2712.2011.01185.x&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21996075&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F04%2F01%2F2022.04.01.22273313.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000296050100013&link_type=ISI) 

34. 34.Christodoulou, E. et al. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. Journal of Clinical Epidemiology. 110 12–22 (2019).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.jclinepi.2019.02.004&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=30763612&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F04%2F01%2F2022.04.01.22273313.atom) 

35. 35.Benedetto, U. et al. Can machine learning improve mortality prediction following cardiac surgery? Eur. J. Cardio-thoracic Surg. 58, 1130–1136 (2020).
    
    
36. 36.He, H. & Garcia, E. A. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21, 1263–1284 (2009).
    
    
37. 37.Nembrini, S., König, I. R. & Wright, M. N. The revival of the Gini importance? Bioinformatics 34, 3711 (2018).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/bty373&link_type=DOI)