Development and validation of blood-based prognostic biomarkers for severity of COVID disease outcome using EpiSwitch 3D genomic regulatory immuno-genetic profiling.

The COVID-19 pandemic has raised several global public health challenges to which the international medical community have responded. Diagnostic testing and the development of vaccines against the SARS-CoV-2 virus have made remarkable progress to date. As the population is now faced with the complex lifestyle and medical decisions that come with living in a pandemic, a forward-looking understanding of how a COVID-19 diagnosis may affect the health of an individual represents a pressing need. Previously we used whole genome microarray to identify 200 3D genomic marker leads that could predict mild or severe COVID-19 disease outcomes from blood samples in a multinational cohort of COVID-19 patients. Here, we focus on the development and validation of a qPCR assay to accurately predict severe COVID-19 disease requiring intensive care unit (ICU) support and/or mechanical ventilation. From 200 original biomarker leads we established a classification model containing six markers. The markers were qualified and validated on 38 COVID-19 patients from an independent cohort. Overall, the six-marker model obtained a positive predictive value of 93% and balanced accuracy of 88% across 116 patients for the prognosis of COVID-19 severity requiring ICU care/ventilation support. The six-marker signature identifies individuals at the highest risk of developing severe complications in COVID-19 with high predictive accuracy and can assist in patient prognosis and clinical management decisions.


74
Background 75 The COVID-19 outbreak, which the World Health Organization (WHO) declared 76 a pandemic in March 2020, represents one of the greatest global health crises 77 the world has faced in recent history [1]. In addition to the estimated 130+ 78 million people that have been infected with the SARS-CoV-2 virus to date and 79 the more than 3 million deaths attributed to COVID-19 related causes; the 80 pandemic has placed tremendous strain on healthcare systems, caused 81 devastating mental health crises, and tested global economic resiliency [2,3]. EpiSwitch ® 3C libraries, with chromosome conformation analytes converted to 171 sequence based tags, were prepared from frozen whole blood samples using 172 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 28, 2021.

Statistical analysis 237
The 21 markers screened on 78 individual patient samples were subject to 238 permutated logistic modelling with bootstrapping for 500 data splits and non-239 parametric Rank Product analysis (EpiSwitch® RankProd R library). Two 240 machine learning procedures (eXtreme Gradient Boosting: XGBoost and 241 CatBoost) were used to further reduce the feature pool and identify the most 242 predictive/prognostic, 3D genomic markers. The resulting markers were then 243 used to build the final classifying models using CatBoost and XGBoost. All 244 analysis was performed using R statistical language with Caret, XGBoost, 245 SHAPforxgboost and CatBoost libraries. 246 247 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 28, 2021. Identification of the top prognostic 3D genomic markers for severe 254

COVID-19 disease outcomes 255
In this study we employed a sequential stepwise strategy to identify a minimal 256 set of biomarkers that were predictive of COVID-19 disease severity (  were procured for a Training cohort used to build and refine the classifier model, 263 and Test cohort to assess the predictive performance of the model. Clinical 264 characteristics of the patients are shown in Table 1  with predictive power to differentiate between COVID-19 patients that required 271 a high degree of medical disease management (e.g. admission to the intensive 272 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 28, 2021. ; https://doi.org/10.1101/2021.06.21.21259145 doi: medRxiv preprint care unit (ICU), mechanical ventilation) and those that were hospitalized but 273 required less interventional care and support (Supplemental Table 1 Pathway enrichment for genes contained within the 21 3D genomic markers 285 revealed the top two pathways to be related to downstream signalling mediated 286 by B-cell receptor activation (Table 3). Importantly, genomic loci encoding 287 proteins involved in hemostasis/clotting were also enriched (Figure 3, Table  288 3). The 21 3D genomic markers were further refined to a set of 6 markers ( Table  289 4) with predictive ability for COVID severity and applied to an independent Test 290 cohort (Supplemental Table 2). 291 292 Testing of the prognostic 3D genomic biomarker panel for severe COVID-293

disease outcomes on independent patient cohorts. 294
To assess the predictive power of the model, the 6-marker 3D genomic panel 295 was validated on an independent (samples that were not used to build and 296 refine the model) Test cohort (Figure 4, Supplemental Table 2). Samples were 297 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 28, 2021. ; https://doi.org/10.1101/2021.06.21.21259145 doi: medRxiv preprint collected upon admission to COVID hospital wards in Peru, the USA, and the 298 Dominican Republic and shipped to OBD's processing facility in Oxford, UK. 299 The EpiSwitch platform read outs for the six-marker classifier model were 300 uploaded to the EpiSwitch Analytical Portal for analysis. Classifier calls for high-301 risk COVID-19 disease outcomes are shown in Table 5. Clinical outcomes for 302 the Test cohort included 10 mild cases or 28 severe cases requiring ventilation 303 and/or ICU support. EpiSwitch prognostic calls based on the 6-marker model 304 demonstrated performance of 90.9% positive predictive value for high-risk 305 disease outcomes in the Test cohort ( Figure 4A). Interestingly, two of the mild 306 case patients (COVID 0696 and 0213) ( Table 5), identified as high risk by the 307 EpiSwitch test subsequently died in the hospital within 28 days of admission. 308 This suggests an early, pre-symptomatic detection of a hyperinflammatory state 309 leading to fatal outcomes and is being investigated further. Across all 116 310 patients used in this study, the test demonstrated positive predictive value for 311 high-risk disease outcomes of 92.9 with, 88% sensitivity, 87% specificity, and 312 a balanced accuracy of 87.9% (Table 4B and Supplementary Table 3). we used a sequential, stepwise approach employing a 78-patient Training 331 cohort to refine the marker set and build a predictive classifier model containing 332 six 3D genomic markers. The 6-marker model/assay was tested on an 333 independent Test cohort of 38 COVID-19 patient blood samples. 334 335 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 28, 2021. ; https://doi.org/10.1101/2021.06.21.21259145 doi: medRxiv preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 28, 2021. ; is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 28, 2021. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 28, 2021. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 28, 2021. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 28, 2021. The COVID-19 pandemic will represent a major public health crisis for months 438 to come. As a corollary, there remains a pressing need for prognostic testing 439 Janssen, and AstraZeneca for use in the US and EU, there are still many 444 individuals that will not be vaccinated due to 1) lack of access 2) ineligibility or 445 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Extension of the current study to a wider distribution and larger number of 505 individuals could help define the regional, racial, and epigenetic prevalence of 506 high-risk biomarkers in these populations. A longitudinal observational study 507 with collections before and after resolution of the acute and chronic phases of 508 COVID disease will provide further invaluable insights into the mechanisms and 509 the long-term stability of the identified systemic biomarker signature. Early 510 evidence indicates that blood samples collected from patients before the onset 511 of the COVID pandemic reveal high-risk profiles in some individuals. This would 512 suggest that the biomarker profiles identified in this study are not emerging in 513 response to COVID infection, but rather represent a pre-existing default state 514 on the spectrum of outcome susceptibility. 515 516 There are several immediate implications of the results reported here. The 517 availability of a simple blood-based assay that provides a readout of likely 518 disease course if infected with SARS-CoV-2 is especially helpful for the triage 519 of individuals who either 1) do not have access to COVID-19 vaccines (due to 520 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 28, 2021. ; https://doi.org/10.1101/2021.06.21.21259145 doi: medRxiv preprint underlying medical conditions, location, or age for example) or 2) choose to 521 forgo vaccination for other reasons. It has been well appreciated that the 522 heterogeneity seen in COVID-19 disease outcomes are largely defined by the 523 host response, rather than the virus or its variants [15]. Here we report on the development and validation of a predictive blood-based 543 assay that can identify, with high accuracy, individuals who are at the highest 544 risk of developing severe complications in COVID-19 disease. The 3D genomic 545 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. assistance in data analysis. In addition, we acknowledge Boca Biolistics LLC. 570 and Reprocell USA Inc. for the timely provision of high-quality clinical blood 571 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 28, 2021.

Competing interests 575
The authors declare that they have no competing interests. 576 577

Consent for publication 578
Written informed consent for publication was obtained from all authors. 579 580

Availability of data and materials 581
The datasets used and/or analysed during the current study are available from 582 the corresponding author on reasonable request. 583 584 585 Ethics approval and consent to participate 586 All patients signed informed consent forms prior to providing blood samples. All 587 ethical guidelines were followed. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 28, 2021. ; https://doi.org/10.1101/2021.06.21.21259145 doi: medRxiv preprint