Association of HLA class I genotypes with age at death of COVID-19 patients

HLA class I molecules play a crucial role in the development of a specific immune response to viral infections by presenting viral peptides to cell surface where they will be further recognized by T cells. In the present manuscript we explored whether HLA class I genotype can be associated with critical course of COVID-19 by searching possible connections between genotypes of deceased patients and their age at death. HLA-A, HLA-B and HLA-C genotypes of n = 111 deceased patients with COVID-19 (Moscow, Russia) and n = 428 volunteers were identified with targeted next-generation sequencing. Deceased patients were splitted into two groups according to age at death: n = 26 adult patients with age at death below 60 completed years (inclusively) and n = 85 elderly patients over 60. With the use of HLA class I genotypes we developed a risk score which is associated with the ability to present SARS-CoV-2 peptides by an individual's HLA class I molecule set. The resulting risk score was significantly higher in the group of deceased adults compared to elderly adults (p = 0.00348, AUC ROC = 0.68). In particular, presence of HLA-A*01:01 allele was associated with high risk, while HLA-A*02:01 and HLA-A*03:01 mainly contributed to the low risk group. The analysis of homozygous patients highlighted the results even stronger: homozygosity by HLA-A*01:01 mainly accompanied early deaths, while only one HLA-A*02:01 homozygote died before 60. The obtained results suggest the important role of HLA class I peptide presentation in the development of a specific immune response to COVID-19. While prediction of age at death by HLA class I genotype had a reliable performance, involvement of HLA class II genotype can make it even higher in the future studies.

HLA class I molecules are one of the key mediators of the first links in the development 2 of a specific immune response to COVID-19 infection. Right after entering the cell 3 SARS-CoV-2 induces the translation of its proteins. Some of these proteins enter the 4 proteasomes of the infected cell, get cleaved to peptides of the length 8-12 amino acid 5 residues and bind to HLA class I receptors. After binding the complex consisting of the 6 HLA class I molecule and the peptide is transferred to the surface of the infected cell, 7 where it can interact with the T cell receptor of CD8+ T lymphocytes. In response to 8 the interaction the CD8+ T lymphocyte activates and starts to divide; in 5-7 days a 9 population of virus-specific cytotoxic CD8+ T lymphocytes capable of destroying 10 infected cells using perforins and serine proteases gets formed [1]. 11 There are three main types of HLA class I receptors: HLA-A, HLA-B and HLA-C. 12 Receptors of every type are present in two variants inherited from parents. There exist 13 dozens of variants of each allele of HLA-I receptors; every allele has an individual ability 14 to recognize various foreign proteins. The distribution of alleles is population/country 15 specific [2]. 16 Individual combinations of HLA class I receptors essentially affect the severity of 17 multiple infectious diseases, including malaria [3], tuberculosis [4], HIV [5] and viral 18 hepatitis [2]. There is a number of reported interconnections between the HLA genotype 19 and sensitivity to SARS-CoV. E.g. the alleles HLA-B*07:03 [6], HLA-B*46:01 [7] and 20 HLA-C*08:01 [8] are factors of predisposition to a severe form of the disease; the allele 21 HLA-C*15:02 is associated with a mild form [9]. 22 Information on the interconnection of HLA class I genotype and severity of the 23 course of the new coronavirus infection  caused by SARS-CoV-2 is sparse. 24 A sample of 45 patients with varying severity of COVID-19 was used to confirm the 25 results of theoretical modeling of interaction of SARS-CoV-2 peptides with various 26 HLA-I alleles [10]. It was demonstrated that the number of peptides with a high 27 interaction constant is connected with individual HLA genotype: the more viral peptides 28 with high affinity bind to HLA class I, the easier is the course of the disease. It was also 29 shown that the frequency of the occurrence of HLA-A*01:01 and HLA-A*02:01 alleles is 30 related to the number of infections and mortality rate in different regions of Italy [11]. 31 In the present study we explored whether HLA class I genotype can be a factor 32 contributing to the critical course of COVID-19. For that we performed HLA 33 genotyping for n = 111 deceased patients with COVID-19 as well as the control group 34 (n = 428), and searched for putative associations between genotypes and age at death. 35 Since the total number of distinct HLA class I genotypes is too high for performing 36 frequency-based analysis, we assigned scores to each allele based on capability of 37 presenting SARS-CoV-2 peptides. The obtained scores allowed us to make valid 38 statistical comparison of HLA genotypes in groups of deceased adults (age at death not 39 greater than 60 completed years, n = 26), elderly adults (age at death over 60, n = 85) 40 and the control. Special attention was paid to "extreme" cases formed by individuals 41 homozygous by some of HLA genes. Additionally, we assessed the contribution of each 42 viral protein to the constructed risk model. or bronchoalveolar lavage. Patients with pathologies that lead to greater morbidity or 49 who had additional immunosuppression (HIV, active cancer in treatment with 50 chemotherapy, immunodeficiency, autoimmune diseases with immunosuppressants, 51 transplant patient) were not included in the study. Blood (2 ml) was collected by the 52 medical practitioner from the right ventricle in an EDTA vial post-mortem. Patients 53 were divided into two groups according to their age of death: adults (age ≤ 60, n = 26) 54 and elderly adults (age > 60, n = 85).

55
The control group of 428 volunteers was established with the use of electronic HLA 56 genotype records of the Federal register of bone marrow donors (Pirogov Russian 57 National Research Medical University). All patients or their next of kin gave informed 58 consent for participation in the study.

59
The study protocol was reviewed and approved by the Local Ethics Committee at sequences IMGT/HLA v3.41.0 [12]. Processed genotype data is available in S1 Table. 72 SARS-CoV-2 protein sequences 73 Publicly available SARS-CoV-2 proteomes derived from patients infected in Moscow 74 (n = 79) were obtained from GISAID [13] (full list of IDs is presented within S2 Table). 75 Clustal Omega v1.2.4 was used to construct multiple sequence alignment for each viral 76 protein [14]. The obtained alignment had no gaps and rare mutations: only 117 out of 77 9719 positions (1.2%) contained more than one amino acid variant. Moreover, 78 distribution of non-major amino acid fractions at mutation sites was also concentrated 79 near zero: maximum fraction was equal to 22.8% (18 out of 79 viruses), 0.95 quantile 80 was equal to 5.1% (4 viruses) and upper quartile was equal to 1.3% (one virus with 81 mismatched amino acid).  Prediction of viral peptides and assessment of their binding 89 affinities to HLA class I molecules 90 We applied the procedure described by Nguen et al in [15] to the consensus protein 91 sequences of viruses isolated from patients in Moscow. Specifically, for each amino acid 92 of each viral protein we assessed the probability of proteasomal cleavage in the taking all possible 8-mers to 12-mers having proteasomal cleavage probability not less 95 than 0.1 at both ends of a sequence.

96
Binding affinities were predicted using netMHCpan v4.1 [17] for all viral peptides 97 (n = 15314) and HLA alleles present in our cohorts of deceased and control patients 98 (n = 107). Peptides having weak binding affinity to all considered alleles were discarded 99 (IC 50 affinity values above 500 nM as recommended by netMHCpan developers). For 100 the remaining 6548 peptides all affinities were inverted, multiplied by 500 and 101 log 10 -transformed. Thus, the resulting score was equal to zero for peptides with weak 102 binding affinity threshold (500 nM) and equal to one for the high binding affinity 103 (50 nM). Raw and processed matrices are presented in S3 Table. 104 Statistical analysis 105 Allele frequencies in considered cohorts were estimated by dividing the number of 106 occurrences of a given allele in individuals by the doubled total number of individuals 107 (i.e. identical alleles of homozygous individuals were counted as two occurences). The 108 following functions from scipy.stats Python module [18] were used to conduct statistical 109 testing: fisher exact for Fisher's exact test, mannwhitneyu for Mann-Whitney U test.

110
Benjamini-Hochberg procedure was used to perform multiple testing correction.

112
Permutation test for assessing significance of area under the receiver operating 113 characteristic curve (AUC ROC) values was done with n = 10 6 label permutations.

116
Distribution of HLA class I gene alleles in the cohort of 117 deceased COVID-19 patients and the control group 118 We performed HLA class I genotyping for n = 111 deceased patients with confirmed 119 COVID-19 (Moscow, Russia) and the control group consisting of volunteers (n = 428). 120 Deceased patients were divided into two groups: adults (age at death less or equal to 60 121 years) and elderly adults (age at death over 60 years). Demographic and clinical data of 122 these cohorts is summarized in Table 1. Although patients with severe comorbidities 123 were excluded from the study, 76.6 % of deceased patients had at least one underlying 124 disease. Only cerebrovascular disease had statistically significant odds ratio when 125 comparing groups of adults and elderly adults (3.8% versus 34.1%, Fisher's exact test 126 p = 1.89 × 10 −3 ). Other cardiovascular diseases like coronary artery disease and heart 127 failure were also more frequent in group of elders which, however, was not statistically 128 significant. Interestingly, arterial hypertension was diagnosed in 11.5% adult patients 129 and 24.7% older adults which was generally less than populational level in Russia (about 130 50%) [21]. Percentage of diabete cases was about 3.5% in both analyzed groups which is 131 a typical value for the current population of Russia [22]. Also, frequencies of chronic 132 kidney disease (stages 4-5) in both groups (23.1% for adults and 16.5% for elders) was 133 significantly higher compared to background populational value (about 0.05%) [23]. 134 First, we tested whether frequency of a single allele can differentiate individuals from 135 three groups: adult patients died from COVID-19, elderly patients died from COVID-19 136 and the control group. Distribution of major HLA-A, HLA-B and HLA-C alleles in 137 these three groups is summarized in Fig 1. Fisher's exact test was used to make formal 138 statistical comparisons. As a result, we found that for all possible group comparisons 139 not a single allele had odds ratio which can be considered statistically significant after 140 November 19, 2020 4/16 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted November 22, 2020. ; multiple testing correction (all corrected p-values were equal to 1). However, few of them 141 were differentially enriched if no multiple testing correction was applied (S4 Table).

142
Binding affinities of viral peptides to HLA class I molecules 143 Since sizes of considered cohorts were insufficient for performing frequency analysis at 144 level of full HLA class I genotypes, we transformed patient genotypes from discrete   HLA-A and HLA-C we found four principal components (PCs) each of which explained 179 at least 5% of data variance, while for HLA-B the number of essential components was 180 equal to five (S5 Table). Signs of components were set in the way to achieve positive  Table). Interestingly, the 193 difference of RS distributions in the cohort of adult patients and the control group was 194 also statistically significant (U test p = 3.31 × 10 −3 ), while the difference between 195 elderly and control groups was not (p = 0.283).

196
In order to characterize the association between RS and age at death more precisely, 197 we partitioned the range of RS into three groups: low, medium and high (Fig 5). The 198 lower and higher thresholds were calculated in a way to minimize p-value for Fisher's 199 exact test applied to the number of adult and elderly patients in the whole cohort and 200 in the low/high risk groups, respectively. Interestingly, such partitioning led to 201 significant separation of adult patients both from elderly and control ones in low and 202 high risk groups, while no significance was found within the middle group (S7 Table). 203 Then, we performed enrichment analysis to identify alleles significantly contributing 204 November 19, 2020 8/16 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review) preprint
The copyright holder for this this version posted November 22, 2020. ;  to each of RS groups ( Table 2). As it can be seen, frequencies of several alleles were  Table 3: distribution of peptides with the strongest contributions over 217 November 19, 2020 9/16 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review) preprint
The copyright holder for this this version posted November 22, 2020. ;    . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted November 22, 2020. ; . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted November 22, 2020. ; https://doi.org/10.1101/2020.11.19.20234567 doi: medRxiv preprint the median age of deceased patients was 73.0 years. Three previous studies reported an 256 average age in non-survivors respectively of 78.0, 65.8 and 70.7 years-old [24][25][26]. Our 257 data are in line with the literature reaffirming that advanced age is one of the strongest 258 predictors of death in patients with SARS-CoV-2 [27]. 259 The majority of deceased patients at age not greater than 60 were men (61.5%), 260 while the populational level for this age category in Russia is 48% [28]. Such imbalance 261 is in agreement with information that COVID-19 is more prevalent in men [27]. This 262 trend continued on group of elderly adults where sex distribution was close to uniform, 263 while only 37.5% of population from this age group are males.

264
Only 18 deceased elderly patients (21.2%) had not any comorbidities. A number of 265 previous studies mentioned high percentage of comorbidities in group of patients with 266 severe course of COVID-19 [24,28,29]. At the same time, we were unable to find  [30]. It is well known that frequency of cardiovascular comorbidities increases 274 with age [31], and in total with age-related decrease in T-cell receptor repertoires it 275 negatively affects prognosis of COVID-19 [32].  Since the available cohort size is insufficient to deeply cover possible genotypes (two 285 alleles for each of HLA-A, HLA-B and HLA-C genes) we assigned a numerical value to 286 each allele associated with aggregate binding affinity of viral peptides to the 287 corresponding receptor. The obtained risk score (RS) separated adult patients died due 288 to COVID-19 both from elderly ones and the control group with a statistical 289 significance. A conceptually similar technique was used by Iturrieta-Zuazo et al: allele 290 score was calculated as a number of tightly binding viral peptides (affinity less than 291 50 nM) [10]. However, our PC-based approach can be more robust since it does not 292 depend on any threshold.

293
To identify extreme values of RS we splitted its range into low, medium and high 294 risk groups. Three HLA-A alleles were highly overrepresented in these groups:  individuals. Homozygosity only by HLA-A*01:01 as well as homozygosity by any allele 304 were significantly associated with earlier age of death compared to the corresponding 305 heterozygous individuals. Such observation was already noted for some other infectious 306 diseases. For example, limited number of recognized peptides due to HLA class I homozygosity lead to higher progression rate from HIV to AIDS [34]. On the contrary, 308 we found that only one out of eight HLA-A*02:01 or HLA-A*03:01 homozygous 309 individuals died before 60 years. This fact can be also observed in dataset recently 310 published by Warren with co-authors [35]: none out of five COVID-19 patients 311 homozygous by HLA-A*02:01 or HLA-A*03:01 had severe course of COVID-19 and 312 were admitted to intensive care unit. Thus, presentation of "important" viral peptides 313 with doubled intensity can enhance immune response showing that HLA class I 314 homozygosity can act like a double-edged sword.

315
Since RS was constructed as a linear combination of peptide-HLA binding affinities, 316 it was possible to rank peptides according to their contribution to the RS. Only one 317 protein, NSP8, had a statistically significant fraction of RS contributing peptides, which, 318 however, was negligible after multiple testing correction. Thus, the most "important" 319 peptides were spread across viral proteins proportional to their total fractions (see 320   Table 3). These results suggest that spike protein-based vaccines at phase 3 clinical  . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted November 22, 2020. ; https://doi.org/10.1101/2020.11.19.20234567 doi: medRxiv preprint S4