Immuno-proteomic profiling reveals abundant airway CD8 T cells and ongoing epithelial injury in prolonged post-COVD19 respiratory disease

Some patients hospitalized with acute COVID19 suffer respiratory symptoms that persist for many months. To characterize the local and systemic immune responses associated with this form of long COVID, we delineated the immune and proteomic landscape in the airway and peripheral blood of normal volunteers and patients from 3 to 6 months after hospital discharge. The bronchoalveolar lavage (but not peripheral blood) proteome was abnormal in patients with post-COVID19 lung disease with significantly elevated concentration of proteins associated with apoptosis, tissue repair and epithelial injury. This correlated with an increase in cytotoxic lymphocytes (especially tissue resident CD8+ T cells), lactate dehydrogenase and albumin (biomarkers of cell death and barrier integrity). Follow-up of a subset of these patients greater than 1-year post-COVID19 indicated these abnormalities resolved over time. Collectively, these data indicate that COVID-19 results in a prolonged change to the airway immune landscape in those with persistent lung disease, with evidence of cell death and tissue repair linked to ongoing activation of cytotoxic T cells.

Introduction 55 56 Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) related coronavirus 57 disease (COVID19) manifests as a spectrum of acute illnesses ranging from mild 58 respiratory symptoms to severe, sometimes fatal, respiratory failure (Docherty et al.,59 The red module consisted of 37 proteins ( Figure 3B, Supplementary File 1H). 258 Overall, the red module was characterized by proteins associated with chemotaxis, 259 inflammation, cell death and repair. Examination of the levels of proteins in the red 260 module across samples highlighted both upregulation of individual proteins in post-261 COVID19 patients versus healthy controls, and the co-upregulation of groups of 262 related proteins such as the CXCR3 chemokines (CXCL9, CXCL10 and CXCL11), 263 and IL1A (interleukin-1A) and its antagonist IL1RN ( Figure 3B). We used the 264 STRING database to visualize known or predicted relationships between proteins in 265 the module (Figure 3C, Methods). To highlight putative key proteins in the red and 266 blue modules in a data-driven way, we identified hub proteins, defined as those that  (Figure S4B). Of these, LYVE1 and VASN were 288 also identified in the differential abundance analysis. 289 In contrast to the BAL network analysis, no protein modules in plasma were 290 associated with case-control status. These results suggest that persistent post-291 COVID19 respiratory abnormalities have a demonstrable proteomic signature in BAL 292 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted August 10, 2021. ; https://doi.org/10.1101/2021.08.10.21261834 doi: medRxiv preprint that is distinct compared to that of healthy control BAL. In contrast, we were unable 293 to detect changes in the plasma proteome of post-COVID19 patients, even with the 294 enhanced statistical power provided by the WGCNA method.   were more frequent in the airways ( Figure 5C). In comparison, DCs were identifiable 362 in both sites as pDCs, cDC1s and cDC2s, although were infrequent ( Figure 5D). 363 Furthermore, airway monocyte activation status (as determined by CD86 and HLA-364 DR expression) was not dependent on acute disease severity ( Figure 5E). Likewise, 365 correlation of airway LDH and albumin concentrations revealed no strong 366 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted August 10, 2021. ; https://doi.org/10.1101/2021.08.10.21261834 doi: medRxiv preprint associations between the relative abundance of different myeloid cells and ongoing 367 damage in the respiratory tract after COVID19 ( Figure 5F); nor was there any 368 association between these cell types and DPP4 concentrations. Taken together this 369 indicates that while myeloid cell frequencies in the airways are associated with a 370 specific airway proteome landscape, this likely has limited association with ongoing 371 tissue damage after SARS-CoV-2 infection.   in the blood (Figure 6B-C). Marker expression analysis identified these metaclusters 382 (as described in Figure 6B), with the differential clusters being identified as tissue-383 resident memory CD4 and CD8 T cells (metaclusters 1 and 10), and naïve CD4 T 384 cells (metacluster 5). Manual gating of T cell populations showed that while the blood 385 was enriched for naïve CD4 and CD8 T cells, the airways contained populations of 386 antigen-experienced CD4 and CD8 T cells ( Figure 6D). Additionally, the airways 387 contained increased proportions of activated and tissue-resident memory (TRM) CD4 388 and CD8 T cells compared to blood ( Figure 6D). Proportions of γδ T cells and NK 389 cells in the airways and blood were comparable however there was a significant 390 increase in NKT cell proportions compared to blood ( Figure 6E). Comparison of the 391 T cell dynamics in the post-COVID19 airways showed that airway CD4 T cells retain 392 a highly tissue resident phenotype (CD69 + CD45RA -), with few naïve (CD45RA + ) CD4 393 T cells observed, and the frequency of these 2 cell populations remains relatively 394 static, irrespective of the total number of CD4 T cells found in the airways ( Figure  395 6F). In comparison, as CD8 T cell numbers increased in the airways, the proportion 396 displaying a CD45RA -CD69 + phenotype significantly increased, while those with a 397 CD45RA + phenotype declined ( Figure 6F). 398

399
In line with activated CD8 T cells, rather than activated CD4 T cells, being linked to 400 ongoing epithelial damage in the post-COVID19 airway there was a correlation (r > 401 0.4) between CD8, but not CD4, T cell proportions in the airways and the 402 concentrations of airway albumin and LDH ( Figure 6G); this correlation was stronger 403 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted August 10, 2021. ; https://doi.org/10.1101/2021.08.10.21261834 doi: medRxiv preprint when only CD69 + CD8 T cells were analyzed (r = 0.5, p < 0.05 for both Albumin and 404 LDH). LDH most significantly correlated with the proportion of CD69 + CD103 -CD8 T 405 cells in the airways (r = 0.7, p < 0.001), while albumin most significantly correlated 406 with the proportion of CD69 + CD103 + CD8 T cells in the airways (r = 0.6, p < 0.001), 407 highlighting distinct markers of damage were linked to different phenotypes of CD8 T 408 cells in the airway. Of note however, the CD8 T cell frequency did not correlate with 409 the concentration of DPP4 in the airways ( Figure 6G). Deeper analysis revealed that 410 albumin concentrations were associated with reduced expression of molecules 411 associated with T cell activation on both CD103 + and CD103 -CD69 + CD8 T cells, 412 most notable CD28 and CD38 ( Figure 6H). LDH concentrations meanwhile were 413 linked to downregulation of CD127 and, especially on CD103 + CD69 + CD8 T cells, 414 but upregulation of CD69 and CD27 (CD69: r = 0.5, p < 0.05; CD27: r = 0.6, p < 0.05, 415   Table S3). The total number of BAL cells recovered was greatly reduced in all 3 432 patients between the initial bronchoscopy and the 1 year follow up bronchoscopy, 433 down to comparable levels in healthy control airways ( Figure 7A). Similarly, numbers 434 of T cells, B cells, NK cells and NKT cells were reduced to nearly or within the normal 435 range seen in the airways of healthy individuals ( Figure 7B). In the 2 individuals with 436 elevated lymphocytes the ratio of CD4 to CD8 T cells increased ( Figure 7C). In 437 accordance with this the proportion of CD8, but not CD4, T cells trended to decrease, 438 although the proportion of each that were of a TRM or activated (CD69 + ) phenotype, 439 remained similar between the 2 time points (Figure 7D). Fitting with progressive 440 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted August 10, 2021. ; https://doi.org/10.1101/2021.08.10.21261834 doi: medRxiv preprint recovery trajectory airway DPP4 concentrations declined in the 2 patients with 441 elevated concentrations at the first bronchoscopy ( Figure 7E), while CT abnormality 442 was also reduced in all 3 patients between the first CT and the follow up ( Figure 7F). 443 Interestingly LDH was however increased between the first and second visit, while 444 albumin concentrations were unchanged, but also within the range of healthy controls 445 at both time points for all 3 patients analyzed ( Figure 7G). CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. long-term respiratory viral pathology, especially in humans. This is primarily due to 512 the lack of relevant samples collected during the recovery period. However, the data 513 shown here for SARS-CoV-2 may provide a potential insight, supporting the concept 514 that sustained activation of CD8 TRM in the airways, long after recovery from acute 515 disease, may contribute to ongoing immune pathology through sustained damage to 516 the respiratory epithelium. 517

518
The mechanism behind the increased number of CD8 TRM in the airways is unclear, 519 although another study has also reported detecting virus specific CD8 T cells in lung 520 number. They also observed that the lungs of a mouse that had previously 527 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted August 10, 2021. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted August 10, 2021. ; https://doi.org/10.1101/2021.08.10.21261834 doi: medRxiv preprint and are absent or only present at very low concentrations in the healthy airway, 639 suggesting that their upregulation is more likely to be a consequence of COVID19 640 than a pre-disposing risk factor. 641

642
In summary, our study offers unique and novel insights into ongoing 643 immunopathology post-COVID19. In contrast to the inflammatory pathways observed 644 during acute disease, we found proteins associated with ongoing epithelial damage, 645 cell death, and repair were upregulated in conjunction with ongoing CD8 T cell 646 activation. These data indicate that significant immune pathways operate within the 647 tissue, presumably to facilitate repair and resolution, even in the absence of 648 peripheral inflammation. In the future it will be important to determine how such a 649 substantial shift in the immune landscape of the airways might affect the response to 650 a subsequent respiratory infection such as seasonal influenza. The progressive 651 improvement in respiratory pathology, even in this cohort with substantive 652 radiological abnormalities should be seen as a positive indicator for the long-term 653 prognosis for those with persistent morbidity. Moreover, the involvement of the 654 immune response suggests this recovery could be accelerated using immuno-655 modulatory treatments, especially those designed against TH1 immune responses. 656 657 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted August 10, 2021. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted August 10, 2021. ; https://doi.org/10.1101/2021.08.10.21261834 doi: medRxiv preprint  CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted August 10, 2021. ; https://doi.org/10.1101/2021.08.10.21261834 doi: medRxiv preprint . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted August 10, 2021.  is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Proteomic assays 936
Plasma and BAL proteomic measurement was performed using the Olink proximity 937 extension immunoassay platform. Five 92-protein multiplex Olink panels were used 938 ('Inflammation', 'Immune Response', 'Cardiometabolic', Cardiovascular 2', 939 'Cardiovascular 3'), providing measurements of 460 protein targets per sample. 940 Cryopreserved BAL and plasma samples were thawed on ice and mixed well by 941 pipetting before plating 88 samples per plate ensuring case/control balance and 942 random well ordering to prevent confounding of technical and biological effects. For 943 BAL samples, a pilot study was performed using three control samples and three 944 post-COVID19 samples (severe group) to determine optimal dilution parameters. 945 Ultimately BAL was used neat. Since a small number of proteins were assayed on 946 more than one panel, we measured a total of 435 unique proteins. We removed 947 duplicate assays at random prior to subsequent analyses. 948 949

Proteomics analyses 950
Proteomic data was normalised using standard Olink workflows to produce relative 951 protein abundance on a log2 scale ('NPX'). BAL and plasma proteomic data were 952 normalised separately. Quality assessment was performed by (1) examination of 953 Olink internal controls and (2) inspection of boxplots, relative log expression plots, 954 and PCA. 955 PCA was performed using singular value decomposition. Following these steps, 2 956 clear outlying samples were removed from the BAL dataset. 957

958
To identify proteins that were differentially abundant between case and controls, for 959 each protein we performed linear regression (lm function in R) with case/control 960 status as the independent variable and protein level (NPX) as the dependent 961 variable. P-values were adjusted for multiple testing using the Benjamini-Hochberg 962 procedure (p.adjust function in R). A 5% false discovery rate was used to define 963 statistical significance. 964

965
We used the WGCNA R package (Langfelder and Horvath, 2008;Zhang and 966 Horvath, 2005) to create a weighted protein correlation network. Prior to WGCNA 967 analysis, protein data were scaled and centred, and missing data were imputed using 968 the R caret package. We used the WCGNA adjacency function to produce a weighed 969 network adjacency matrix, using parameters "type=signed" and "power=13". This 970 soft-thresholding power was selected as the lowest power to achieve approximate 971 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted August 10, 2021. ; https://doi.org/10.1101/2021.08.10.21261834 doi: medRxiv preprint scale-free topology. We next defined a topological overlap matrix of dissimilarity 972 using the TOMdist function. Clusters ('modules') of interconnected proteins were 973 identified using hierarchical clustering and the cutreeDynamic function with 974 parameters: method="hybrid", deepSplit=2, minClusterSize=15. We then tested 975 association of these modules with case/control status. Multiple testing correction was 976 performed to account for the number of modules. We report both Benjamini-977 Hochberg and Bonferroni adjusted p-values to provide two levels of stringency. 978

979
To assess the distribution of p-values from the differential protein abundance 980 analyses, we plotted histograms and constructed QQ plots. Reactome database (Supplementary File 1C). 988 989 Protein modules were visualised using STRING (https://string-db.org/), with known or 990 suspected interconnections between module members displayed as edges in a 991 network diagram. An edge represents a protein-to-protein relationship defined as 992 shared contributions to a particular function, and not necessarily implying physical 993 binding. In Figure 3C, edge colour indicates the type of evidence for the relationship: 994 turquoise represents known interactions from curated databases; magenta 995 represents experimentally determined interactions; green represents predicted 996 Interactions from gene neighbourhood analysis; red represent predicted interactions 997 from gene fusions, blue represent predicted Interactions from gene co-occurrence; 998 light green represents interaction from text-mining; black represents interaction from 999 co-expression data, and violet represents information from protein homology. 1000 1001 CXCR3 chemokine composite score 1002 To create a composite score that reflected the CXCR3 chemokines (CXCL9, 1003 CXCL10 and CXCL11), we used the following approach. For each sample, protein 1004 levels for CXCL9, -10 and -11, were normalised to the median level in healthy 1005 controls (to avoid unduly weighting the score towards chemokines with higher NPX 1006 values). For each sample, the mean of the normalised values for the 3 proteins was 1007 then calculated to provide a summary metric for CXCR3 chemokines. 1008 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.