Prediction Models for Severe Manifestations and Mortality due to COVID-19: A Rapid Systematic Review

Background Throughout 2020, the coronavirus disease 2019 (COVID-19) has become a threat to public health on national and global level. There has been an immediate need for research to understand the clinical signs and symptoms of COVID-19 that can help predict deterioration including mechanical ventilation, organ support, and death. Studies thus far have addressed the epidemiology of the disease, common presentations, and susceptibility to acquisition and transmission of the virus; however, an accurate prognostic model for severe manifestations of COVID-19 is still needed because of the limited healthcare resources available. Objective This systematic review aims to evaluate published reports of prediction models for severe illnesses caused COVID-19. Methods Searches were developed by the primary author and a medical librarian using an iterative process of gathering and evaluating terms. Comprehensive strategies, including both index and keyword methods, were devised for PubMed and EMBASE. The data of confirmed COVID-19 patients from randomized control studies, cohort studies, and case-control studies published between January 2020 and July 2020 were retrieved. Studies were independently assessed for risk of bias and applicability using the Prediction Model Risk Of Bias Assessment Tool (PROBAST). We collected study type, setting, sample size, type of validation, and outcome including intubation, ventilation, any other type of organ support, or death. The combination of the prediction model, scoring system, performance of predictive models, and geographic locations were summarized. Results A primary review found 292 articles relevant based on title and abstract. After further review, 246 were excluded based on the defined inclusion and exclusion criteria. Forty-six articles were included in the qualitative analysis. Inter observer agreement on inclusion was 0.86 (95% confidence interval: 0.79 - 0.93). When the PROBAST tool was applied, 44 of the 46 articles were identified to have high or unclear risk of bias, or high or unclear concern for applicability. Two studied reported prediction models, 4C Mortality Score from hospital data and QCOVID from general public data from UK, and were rated as low risk of bias and low concerns for applicability. Conclusion Several prognostic models are reported in the literature, but many of them had concerning risks of biases and applicability. For most of the studies, caution is needed before use, as many of them will require external validation before dissemination. However, two articles were found to have low risk of bias and low applicability can be useful tools.


INTRODUCTION Background
COVID-19 is the disease caused by SARS-CoV-2 that emerged in China in December 2019. 1 Throughout 2020, COVID-19 has become an increasing threat to public health on national and global level. 2 There has been an immediate need for research to help predict the clinical deterioration including death, mechanical ventilation, and organ support. 3 A precise risk stratification of individuals with COVID-19 enables allocation of appropriate resources such as hospitalization, intensive care admission, ventilatory support, and antiviral therapy. 4 As we start to understand the unique nature of this infection, an accurate risk stratification tool is urgently needed.
Studies thus far have addressed the epidemiology of the disease, common presentations, and susceptibility to disease acquisition and transmission of the virus. 5,6,7 However, the short-and long-term implications of developing severe illnesses caused by COVID-19 is still widely unknown. Many studies with methodological shortcomings are published to facilitate the dissemination of rapidly evolving science of COVID-19. 8,9 The quality of prediction models related to COVID-19 also suffered, as many studies did not have adequate validation steps after models were developed. 9 The purpose of this review is to consolidate published data on validated predictive models of severe illnesses caused by COVID-19. We undertook a rapid systematic review following the format given by the King et al 10

Goals of this investigation
The goal of our review was to collectively evaluate published findings of prediction models for severe illness caused by COVID-19 to inform healthcare providers and the general population of the existing evidence in the midst of the pandemic.

METHODS
The review protocol was registered to PROSPERO (registration number CRD42020201484).
Our study adheres to the Preferred Reporting Items for Systematic Reviews and Meta-analyses . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) guidelines for systematic reviews and was performed in accordance with best-practice guidelines (PRISMA). 11 Search strategies were developed with the assistance of a health sciences librarian with expertise in systematic reviews. Searches were developed by the primary author and librarian using an iterative process of gathering and evaluating terms. Searches were finalized in July 2020.
Comprehensive strategies, including both index and keyword methods, were devised for PubMed and EMBASE. In order to maximize sensitivity, no pre-established database filters were used other than English language. We also screened references of articles identified by these systematic search strategies to look for any additional records potentially useful for the aim of this review. Lastly, we examined additional articles through a referral from investigators and online journal updates . The completed PubMed strategy is shown in Supplemental file 1. PubMed Strategy was then adapted for EMBASE and is available upon request. The last search was conducted on 7/27/2020.
We included studies of randomized controlled trials, cohort studies, and case-control studies that discuss the possible short-term and long-term consequences of contracting COVID-19 in any clinical setting. All studies were considered regardless of publication status (preprint or peer review manuscript), as long as those are included to PubMed or EMBASE. The definition of positive COVID-19 included a positive nasopharyngeal or oropharyngeal swab for COVID-19 by polymerase-chain reaction (PCR) or antigen tests, serological test for COVID-19, or those who were clinically diagnosed as COVID-19 based on clinical presentation and epidemiologic information. Outcomes included intubation, mechanical ventilation, any other type of organ support (for example, hemodialysis), and/or death. We included studies that developed and validated a multivariable model or scoring system, based on individual participant level data, to predict any COVID-19 related outcome or externally validated a known scoring system. We excluded review articles, case reports, editorials, and comments. Studies reporting the predictive ability of a single test, for example, a study only focused on the prognostic value of lactate dehydrogenase (LDH), were excluded from the review, since we were primarily interested in the combination of test results with any internal or external validation. Epidemiological studies that aimed to model disease transmission or diagnostic test accuracy, and predictor finding studies were also excluded.

Data Collection and Processing
Titles, abstracts, and full texts were screened in duplicate for eligibility by independent reviewers (JM and SL), and discrepancies were resolved through discussion moderated by the third reviewer (MG). The reasons for any exclusions were recorded for those that required full article review.
Two investigators (JM, MT) independently extracted data from the included studies. The investigators underwent initial training and extracted data into a predesigned data collection form. The following information was abstracted: author, year, setting, outcome, predictors in final model, sample size for development set, type of validation, sample size, performance, and geographical region.
When data were missing or ambiguous, we contacted the authors for clarification. Studies were independently assessed for risk of bias by two investigators (JM, MT) using the Prediction Model Risk Of Bias Assessment Tool (PROBAST) tool. Any discrepancies were resolved by consensus.

Primary Data Analysis
A narrative synthesis of the characteristics of the included studies, including the outcomes, sample size for development and validation, type of case ascertainment, relationship between the final model, and the development of severe illnesses, were summarized in a table. We listed risk of bias and applicability using the PROBAST 12 tool. To maximize the synthesis of results, we reported the studies based on the low risk of bias, low concern for applicability, and geographical location for future use.

Study selection
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted February 1, 2021. ; https://doi.org/10.1101/2021.01.28.21250718 doi: medRxiv preprint We included a total of 292 of 3399 articles based on the title and abstract. Then, we included a total of 42 out of 292 during the secondary full text screening up to July 2020. We identified an additional four articles from an investigator, online journal update, and references. The details of the selection process are listed in Figure 1. Overall, kappa was 0.86 (95% 0.79 -0.93) between two reviewers.
There were 38 (82%) retrospective cohort studies, and four (9%) prospective cohort studies, and four (9%) retrospective and prospective studies. The majority (n=25, 54%) were conducted at a single institution, 21 studies (46%) were conducted at two or more sites. There were two studies that were conducted in multiple nations. 12, 13 The total sample was 8,621,479 patients, and the mean sample size used for model derivation in the 46 articles was 187,423 (range 66-8,256,158).

Critical appraisal of quality
Next, using the PROBAST, we evaluated the risk of bias for each study and concern for applicability using the low, high, or unclear categories. The articles where the PROBAST tool did not apply were categorized as not applicable.
The PROBAST composite scores of the 46 articles, at the domain and item level, are presented in Figure 2. The overall risk of bias was low in four (9%) studies, high in 34 (74%) studies, and unclear in eight (17%) studies. The overall concern for applicability was low in 22 (48%) studies, high in 17 (37%) studies, and unclear in seven (15%) studies. (Figure 2) The lowest risk of bias was seen in the participants domain (29 studies with low risks of bias, 63%), and highest risk of bias was seen in the analysis domain (32 studies with high risks of bias, 70%). The lowest concern for applicability was seen in the participants domain (38 studies with low concern, 83%), and the highest risk of concern for applicability was seen in the outcome . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted February 1, 2021. ; https://doi.org/10.1101/2021.01.28.21250718 doi: medRxiv preprint domain (high concern, n=16, 35%). When the PROBAST evaluation was stratified to multi-site studies, there were increase in the overall applicability ( Figure 3).
Two studies (4.3%) reported the prediction from images and seven (15%) studies reported the combinations of clinical data and images. These studies included chest X-ray and Computed Tomography (CT findings) either by itself or in conjunction with clinical data. The predictive ability of these models (AUC) reached 0.8 to 0.9, but most of these studies had a high or unclear risk of bias, which was similar to the recent review reported by Wynants et al. 11 Overall, we identified two studies with a low risk of bias and low concern for applicability based on the PROBAST tool, which were Clift et al. and Knight et al. (Table 3) 13,15

Individual Study Characteristics
Detailed characteristics of each of the 46 studies are presented in Table 1

Synthesis of Results
Based on the PROBAST tools, we listed the studies with low risk of bias, low concerns for applicability, and predictive ability, metrics, setting, geographic location, and the link to an online calculator in Table 2.
Two studies were found to have low risk of bias and low concern for applicability (  . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 1, 2021. ; https://doi.org/10.1101/2021.01.28.21250718 doi: medRxiv preprint The model shows that he is in the top 10% of the population at the highest risk. Knight et al. 13 developed and validated a risk score based on common parameters that are available at hospital admission. This included factors such as age, sex, comorbidity, respiratory rate, SpO2, GSC, BUN, and CRP. The study was on the inpatient population, and study findings, particularly the probability of mortality, could easily be adapted for use outside the UK. The study reported the low, intermediate, high, and very high risk for mortality based on predictors. For example, patients in the intermediate-risk group (score 4-8, 21.9%) had a mortality rate of 9.9%. Patients in the high-risk group (score 9-14, n=11 664, 52.2%) had a mortality rate of 31.4%. 13 However, one should account for differences in the criteria for admission and the quality of care (Table 2).

DISCUSSION
This systematic review identified a total of 46 articles that predicted severe COVID-19 in the literature. The majority of articles are from China, Europe, and North America where COVID-19 hit the hardest. These prediction models are focused on prognosticating patients with COVID-19.
Overall, models reported the validation AUC from 0.54 to 0.98, but risk of bias was unclear or high for many studies, likely due to limited transparency on the predictor and outcome assessment, limited number of samples and outcomes. Many of these studies are retrospective studies and from early in the pandemic; most of them are limited due to the event rate and unclear case and outcomes definitions. Sample size was variable throughout studies, and lastly, analysis domain, such as overfitting were often seen as the limitation. These are commonly identified problems with building prediction models, and likely causes overfitting. 16 Although many of the studies are reported out of urgency, several of them did not seem to follow TRIPOD guidelines, 17 which likely contributed to the overall risk of bias for the included studies.
After review of the current literature, our impression is that many of the reported prediction models have high/unclear risk of bias, thus it is not recommended that any particular tool should be used until further external validation is completed, unless the risk of bias is low.
This review identified several implications for potential application of these prediction models in the appropriate clinical settings. As stated earlier, there are 41 studies that reported original models during the pandemic, and four studies 24,32,41,45 that reported the external validation . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 1, 2021. ; models from the severity prediction models which existed before the COVID-19. Clift et al., Knight et al., reported the models that were developed from nationwide data which enabled robust re-sampling methods to minimize overfitting. 13, 15 We conclude that two of them 13,15 can be useful tools (Table 3).

Strength and Limitations
Our review used a clear inclusion and exclusion criteria, which likely led to almost perfect agreement between two reviewers. Furthermore, the use of PROBAST strengthened our review, as this tool enabled accurate evaluation of risk of bias and applicability for each of the included studies.
There are several limitations. First, we used only two databases to identify the literature, which is less than we need for a traditional systematic review. We undertook this review to disseminate a rapidly progressing science of COVID-19 for healthcare providers. Second, our search was completed in July 2020, but this review required additional time to train reviewers to use the PROBAST tool to evaluate each article. Due to the rapid development of COVID-19 pandemic, it is important to note the gap between our search timeline and the current status at the time of . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 1, 2021. ; https://doi.org/10.1101/2021.01.28.21250718 doi: medRxiv preprint article publication. Third, variables and types of prediction models were highly variable among studies, and it was infeasible to conduct meta-analysis or meta-regression, as we specified in the protocol. Our aim was to systematically curate the current state of scientific knowledge to predict severe COVID-19, and we argue that our findings are still informative to healthcare providers without a pooling of results. Fourth, we were primarily interested in multivariable models, which may not have captured innovative studies focused on one predictive marker. Lastly, the volume of literature related to this topic area was a challenge, particularly for preprint articles. Due to concerns with feasibility, database search for preprints were limited to the records included in the mentioned databases.

CONCLUSION
Our review identified several prognostic models for COVID-19, and they showed various discriminative performances. The risk of bias was overall high or unclear for most of the studies, and the risk of bias was highest for the analysis domain. Concern for applicability was low for the majority of included studies. Thus, it is possible that most studies are prone to bias and require external validation before dissemination. The use of prediction models for severe COVID-19 cases requires caution since risk of bias is not negligible. We reported two studies that had a low risk of bias and low concern for applicability, one from a general public population and hospital setting in the UK.   is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 1, 2021. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 1, 2021.   is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 1, 2021.  . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 1, 2021.  . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted February 1, 2021. ; https://doi.org/10.1101/2021.01.28.21250718 doi: medRxiv preprint  CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted February 1, 2021. ; https://doi.org/10.1101/2021.01.28.21250718 doi: medRxiv preprint