Retrospective in silico HLA predictions from COVID-19 patients reveal alleles associated with disease prognosis

Background: The Human Leukocyte Antigen (HLA) gene locus plays a fundamental role in human immunity, and it is established that certain HLA alleles are disease determinants. Methods: By combining the predictive power of multiple in silico HLA predictors, we have previously identified prevalent HLA class I and class II alleles, including DPA1*02:02, in two small cohorts at the COVID-19 pandemic onset. Since then, newer and larger patient cohorts with controls and associated demographic and clinical data have been deposited in public repositories. Here, we report on HLA-I and HLA-II alleles, along with their associated risk significance in one such cohort of 126 patients, including COVID-19 positive (n=100) and negative patients (n=26). Results: We recapitulate an enrichment of DPA1*02:02 in the COVID-19 positive cohort (29%) when compared to the COVID-negative control group (Fisher’s exact test [FET] p=0.0174). Having this allele, however, does not appear to put this cohort’s patients at an increased risk of hospitalization. Inspection of COVID-19 disease severity outcomes reveal nominally significant risk associations with A*11:01 (FET p=0.0078), C*04:01 (FET p=0.0087) and DQA1*01:02 (FET p=0.0121). Conclusions: While enrichment of these alleles falls below statistical significance after Bonferroni correction, COVID-19 patients with the latter three alleles tend to fare worse overall. This is especially evident for patients with C*04:01, where disease prognosis measured by mechanical ventilation-free days was statistically significant after multiple hypothesis correction (Bonferroni p = 0.0023), and may hold potential clinical value.


113
We downloaded Illumina NOVASEQ-6000 paired-end (50 bp) RNA-Seq reads from 114 libraries prepared from the blood samples of 126 hospitalized patients, with (n=100) or 115 as described [19]. We tallied HLA class I (HLA-I) and class II (HLA-II, supported by 123 Seq2HLA and HLAminer only) allele predictions and for each patient we report the most 124 likely HLA allele (4-digit resolution), indicating HLA predictor tool support (Additional 125 file 1, tables S1 and S2). 126 Looking at class I and II alleles predicted in 10% or more of COVID-19 positive 127 patients (class I, n=17; class II, n=11) we calculated Fisher's Exact Test (FET), first 128 testing for enrichment in COVID-19 positive vs. negative patients (R function fisher.test, 129 alternative = "greater"). For those same alleles (found in ≥ 10% patients) and inspecting 130 only the COVID-19 positive cohort, we tested for the probability of patient 131 post 45 day followup (days)" (HFD-45) metric reported by the study author as a proxy 136 for disease severity, with lower HFD-45 numbers indicating worse outcomes. Similarly, 137 we ran the KM estimator using another metric of disease severity, "ventilator-free days", 138 which captures the most severe cases with COVID-19 patients suffering respiratory 139 deterioration and requiring mechanical ventilation. On each set we calculated the log-140 rank p-value (R library survminer) and corrected for multiple hypothesis testing 141 (Bonferroni correction) using the number and patient abundance rank of class I (n=17) 142 or class II (n=11) HLA alleles observed in 10% or more of COVID-19 patients. We also 143 inspected the combined influence of HLA alleles and patient demographics data ( (Tables I and 2, ICU+), and those 170 who were not (Tables 1 and 2, ICU-). When computing FET statistics, we find HLA-I 171 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted November 2, 2020.  . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted November 2, 2020.  . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted November 2, 2020. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted November 2, 2020.  is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted November 2, 2020. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted November 2, 2020. However, although it is well established that patient HLA profiles play a significant role 290 in the onset and progression of infectious diseases in general, we caution against 291 drawing overreaching conclusions from regional, and often limited, observations. We 292 note that recently published studies associating HLA alleles and COVID-19, by and 293 large, disagree in their findings. We expect future studies with larger cohort sizes will 294 help bring a clearer picture of the role of patient HLA profiles, if any, in COVID-19 295 susceptibility and disease outcomes. 296 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted November 2, 2020. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted November 2, 2020. Not applicable 333 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted November 2, 2020.  . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted November 2, 2020. ; https://doi.org/10.1101/2020.10.27.20220863 doi: medRxiv preprint . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted November 2, 2020.  is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted November 2, 2020. ; https://doi.org/10.1101/2020.10.27.20220863 doi: medRxiv preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted November 2, 2020.  . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted November 2, 2020. ; https://doi.org/10.1101/2020.10.27.20220863 doi: medRxiv preprint  Tables 1 and 2). First, looking at the influence of demographic characteristics 513 such as sex (male/female), age (65 years old or above/less than 65 years old) and 514 ethnicity (minority/white ethnic background) on the susceptibility of patients with these 515 alleles to test positive for COVID-19 (lower two panels), and on the risk associated with 516 ICU hospitalization (upper three panels). Red asterisks indicate significant demographic 517 characteristics (Fisher's Exact Test) not corrected for multiple hypothesis tests. 518 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted November 2, 2020. ; https://doi.org/10.1101/2020.10.27.20220863 doi: medRxiv preprint