Abstract
Background For patients with non-small cell lung cancer (NSCLC), the PD-1/PD-L1 blockade treatment were incorporated into first-line treatment commonly. Despite the improved survival observed in PD-1 blockade treatment, a large proportion of patients do not respond while others actually progress during treatment.
Method Transcriptomic profiling was performed on whole blood samples from 30 patients received anti-PD-1 (Tislelizumab) plus chemotherapy. Expression levels of differentially expressed genes (DEGs) identified from two comparisons (post-and pre-treatment, responders and non-responders) were validated by real-time quantitative PCR, analyzed within tissue database and meta-analysis database, then followed by enrichment analysis in high-level representations and in silico leukocyte deconvolution.
Results A panel of blood-based gene signatures (FDR p<0.05, fold change<-2 or >2) were identified (DEG n=155 and 112 in two comparisons) and validated that not only differentially expressed between post- and pre- treatment or responders and non-responders but also in tissue samples between normal and tumor. Genes DLG5 and H3C10 were found negatively associated with overall survival (p<0.05). Enrichment of immunological and metabolism pathways and gene sets indicating activated circulating leukocytes were observed.
Conclusion The molecular and cellular signatures characterized in this study may provide potential blood-based predictors of the response to PD-1 blockade treatment in NSCLC patients.
Introduction
Given the significant benefit checkpoint inhibitors (ICIs) immunotherapy has achieved for various types of cancer in the past decade, it is emerging as a “common denominator” and even in the neoadjuvant (presurgical) setting for cancer therapy [1]. In many countries, ICIs targeting the PD-1/PD-L1 (programmed death-1/programmed death ligand-1) pathway have been gradually approved as monotherapy or in combinations with chemotherapy [2, 3]. Non-small cell lung cancer (NSCLC) as the most common subtype of lung cancer presents a significant health concern with its diagnosis rate and burden highest for male, ever-smoker and the elderly population [4, 5]. Advanced NSCLC was one of the first pioneers becoming a common therapeutic focus of PD-1-targeted immunotherapy as a first-line therapy [6, 7]. It is proved that anti-PD-1 agents with or without chemotherapy could restore anti-tumor activities of multiple immune cell subsets, together leading to an increased overall survival compared with sole conventional chemotherapy [8, 9].
To date, several anti-PD-1 or anti-PD-L1 antibodies have been approved by regulatory authorities covering almost all the common types of cancer indications as monotherapy or in combination with other drugs [10]. Despite the impressive success of PD-1 blockade therapy in cancer, it seems the clinical benefit of this treatment is always and only limited to a small subset of patients [11]. Taking one of the approved anti-PD-1 antibodies Pembrolizumab as an example, the objective response rate for the unselected population with NSCLC was only 19% and the median overall survival was 12 months [12]. All approved PD-1/PD-L1 blockade displayed similar efficacy while combinations of anti-PD-1 therapy with chemotherapy have showed more encouraging results in the up-front treatment of NSCLC [13]. NSCLC as well as other lung cancers are characterized by high degree molecular heterogeneity associated to high level of genetic and phenotypic variations, co-evolving with the complex nature of immune microenvironment [14–16]. The outcome of anti-PD-1 treatment for NSCLC patents is associated with many factors, such as PD-1/PD-L1 expression level, tumor mutation burden, mRNA expression of certain biomarker genes, neo-antigens, and the composition of tumor-infiltrating immune cells [17]. Many clinical studies as well as review articles suggested PD-L1 expression testing should become (if not yet) routine for patients with newly diagnosed NSCLC, while other biomarkers should be evaluated in the future [18–21].
The low response rate to PD-1/PD-L1 blockade treatment suggests better understanding the molecular signatures responding to the therapy is important in the search for potentially reliable biomarkers. Lately, a number of new biomarkers predictive of tumor response to ICIs were proposed, including DNA variant of NK cells [22], CD28 levels in tumor-infiltrating CD8-T cells [23], metabolite from on-treatment serum [24], local or peripheral immune cell clusters [25, 26], or combination of T cell properties and energy metabolism markers in blood [27]. In the present study, comprehensive computational analyses were performed to investigate the molecular signatures and pathway representations in peripheral blood as potential predictors of the response and outcome in NSCLC patients who received PD-1 inhibitor plus chemotherapy. As several innovative in silico deconvolution methods were developed for inference of tumor- infiltrating immune cells in tissue samples [28, 29], we adapted from these new tools and updated a different immune profiling approach from our previous study [30] targeting blood samples. Here, we enumerated a set of key immune cell repertoires in NSCLC patients and responders as we hypothesized that the circuiting immune cell subsets in peripheral blood may ultimately allow discrimination between patients that derive benefit from PD-1 blockade.
Result
Study design and overall analytical workflow
In order to identify the blood-based molecular signature associated with PD-1 blockade treatments in cancer, whole blood samples were collected from 30 NSCLC patients who underwent anti-PD-1 therapy plus chemotherapy. The majority of these patients were male (86.7%), over 60-year-old (66.7%), ever-smokers (80%), and classified as lung squamous cell carcinoma (SqCC) (76.7%) (Table 1). Paired blood samples were collected from 12 patients to construct subgroups for post- and pre-treatment comparisons. A little less than 50% of the patients showed pathologic complete response and were considered as responders in the following analysis.
Blood-based DEGs changed upon combined treatment abnormally expressed in lung cancer tissue
Standard bioinformatics analysis was applied to process RNA-seq reads from 5 pairs of blood samples (pre v post) to identify the genes that differentially expressed in response to anti-PD-1 treatment. Differential expression analysis revealed that a total of 155 protein-coding genes out of almost 30,000 genes (total n =29,195, FDR p <0.05 and fold change >2 or <0.5) were differentially expressed between pre- and post-treated blood samples (Figure 2A). Utilizing the overall RNA-seq profiles, principal components analysis (PCA) constructed complete separation of pre and post sample groups as showed by biplots (Supplementary Figure S1). Hierarchical clustering using the 155 DEGs organized the pre-treatment samples into a distinct group from the post-treatment samples (Figure 2B and Supplementary Table S1a). The mRNA expression difference of 10 top-ranked genes were examined in an independent patient group (n=7 v 7) by qRT-PCR and found all of them were differentially expressed in same patterns (Figure 2C). Out of 10, 8 were confirmed significantly (all p<0.05) changed in pairwise comparison.
Next, we sought to investigate the expression profiles of these top-ranked genes in cancer tissues. Firstly, we searched human tissue database GENT2 that generated from NCBI GEO databases that contains microarray data from >499 lung cancer samples and >1117 normal lung samples. Interestingly, the transcriptional changes of 8 genes (out of top 10 DEGs) from pre to post were consistently seen from tumor to normal lung tissue (p<0.01) (Figure 2D, Supplementary Figure S2 and Supplementary Table S2a). Then we extended our search to a larger lung cancer database that was built from 56 datasets (over 6700 patients) and allows meta-analysis of the gene expression profiles as well as the association between expression and overall survival. Combining statistical power from multiple datasets, a strong agreement to the data above was observed that a consistent tumor-to-normal change was significantly detected in 6 out of the 8 genes identified above (all p<0.05) (Figure 3, Supplementary Figure S3 and Supplementary Table S2a).
DEGs down-regulated in responders were over-expressed in lung cancer and associated with poor overall survival
We hypothesized that the molecular signatures correlated with complete responses may indicate the prognosis of NSCLC patients. RNA-seq data from responders (res) and non-responders (non- res) was collected and submitted for PCA analysis. Several combinations of key principal components (PCs), such as PC1 + PC3, successfully separated res and non-res samples (Figure 4A, yellow highlighted). A total of 112 DEGs as identified by comparing res and non-res subjects and utilized to construct hierarchical clustered heatmap where res and non-res subjects were grouped distinctly (Figure 4B & 4C and Supplementary Table S1b). Our qRT-PCR results confirmed in an independent cohort that the expression differences of 7 top-ranked DEGs (out of 10) were consistently seen between res and non-res subjects (n=7 v 7) (Supplementary Table S1b).
Again, we investigate the tissue expression profiles of the top 10 DEGs in GENT2 and LCE databases. Seven genes were found differently expressed between normal lung and lung cancer tissue according to GENT2 database (Figure 4E and Supplementary Table S2b) and 4 of them showed consistent differences in meta-analysis from LCE databases (Figure 5A and Supplementary Table S2b). For instance, genes DLG5 and H3C10 were down-regulated in responder groups as detected by RNAseq and qRT-PCR (Figure 4B & 4D), and down-regulated in normal lung tissue as compared with tumor tissue (Figure 4E & 5A). Notably, unlike the fact that survival meta-analyses in LCE database provided no significant association found for DEGs from pre v post comparison, our search for top DEGs from res v non-res comparison revealed two cases of negative expression-survival association in lung cancer patients (p<0.01 for DLG5 and p=0.04 for H3C10) (Figure 5B and 5C).
Immunological and metabolism pathways and leukocyte abundances are key signatures in responders after treatment
Pathway and gene ontology (GO) overrepresentation analysis provided an answer about which regulatory mechanisms are statistically significantly associated with the combinational treatment or the treatment responsiveness. KOBAS-i is a newly released tool utilizing machine learning algorithms and integrating functional class scoring approaches into a single ensemble score, to optimize the prioritization of biologically relevant pathways from expression data. The results indicated the co-modulated DEGs from post v pre comparison were significantly enriched in mechanism clusters that commonly connected to each other (C1-C3, C5& C6) (Figure 6A), covering all top-ranked pathways (Figure 6B). Differentially, DEGs from res v non-res comparison were enriched in separated mechanism (Figure 6C). Interestingly, the top 10 overrepresented mechanisms (ranked by corrected p-value) in two comparisons shared pathways such as “Immune System” and “Metabolism of protein” and gene ontologies such as “Protein binding” (Figure 6D). The difference was the significant mechanisms in comparison of post- v pre focused immunological and metabolism pathways while the significant mechanisms in comparison of res v non-res are more diverse.
Interestingly, GSEA analyses focused on immunologic signature gene sets (collection C7, in total 5279 gene sets) identified 52 down-regulated and 14 up-regulated gene sets from comparison of post- and pre-treatment (Figure 7A, pink). Zero down-regulated and 116 up- regulated immune gene sets were recognized using our DEG data from res v non-res comparison (Figure 7A, blue). Overlapped up-regulated gene sets (n=9) of two comparisons indicated a broad spectrum of the major leukocyte subsets were activated in responding patients after treatment, including monocyte, Dendritic cells (DCs), T helper 2 (TH2) cells and B cells (Figure 7A, purple). Moreover, the shared gene sets contained gene set GSE1889 whereas genes are up- regulated after stimulated by TNF as comparing control cells when versus T-reg cells. This finding suggested, in addition to the activated immune cell subsets, inactivation of T-reg cells was also triggered in a successful treatment (Figure 7A, purple).
Finally, an in silico leukocyte deconvolution approach (AImmune) for inference of key circulating immune cells components was modified from our previous study [30] with inclusion of new gene markers adapted from single cell RNAseq data obtained from peripheral blood mononuclear cells (PBMCs) of NSCLC patients and healthy donors. To assess deconvolution performance, the relative expressions and pairwise correlations of these new signature genes across 7 key circulating immune cells were illustrated by heatmap and similarity matrix, respectively (Figure 7B and Supplementary Figure S4). Immune cell scores were generated by AImmune approach utilizing the RNAseq profiles in res and non-res subjects and no significant difference was observed (Figure 7B). We also calculated and compared the scores for a simulated cohort (n=20 each group) based on the individual profiles (Supplementary Figure S5). The results indicated monocytes and both CD4+ and CD8+ T lymphocytes showed significant increase in res subjects as compared to non-res subjects.
Discussion
Given the majority of tumor patients fail respond to anti-PD-1/PD-L1 therapy, a lot of effort made by clinical trials led to the identification of predictive biomarkers to select treatment responders and the discovery of underlying mechanisms during treatment for improving current treatment modalities [31, 32]. Although PD-1/L1 immunohistochemical expression has been widely evaluated as a predictive biomarker, it is not implemented as a routine diagnostic test for anti–PD-1/PD-L1 therapy worldwide, including China [33]. Part of the reason is lack of universal standard of antibody for detecting PD-L1 expression and no mention it is still controversial whether PD-1/L1 expression is an inadequate determinant of treatment benefit with anti-PD-1/L1 therapies.
The overall hypothesis of the present study is the blood-based molecular signatures that differently expressed in blood samples from responders versus non-responders may differ between normal and cancer in tissue as tumor-driving genes, and thus provide additive prognostic value in patient selection. Our integrated analysis workflow (Figure 1) also included innovative approaches as another layer of blood-based strategy to evaluate the leukocyte abundance as non-invasive biomarkers reflecting the host immunity, which ultimately will improve the clinical efficacy of PD-1 blockade therapy. By profiling the patients and responders during the treatment, our data provided new evidence for targetable pathways, driving factors and molecular and cellular determinants of response that is correlated with the beneficial response and clinical efficacy.
At the beginning of this study, investigation of the top-ranked 10 DEGs found 6 genes (up: HBG1, HBG2, GATA2, IFI27; down: LHX4 and ANKRD22) that were not only differentially expressed in blood samples between post and pre subjects but also abnormally expressed in lung cancer tissue (Figures 2C, 2D and Figure 3). These findings were significant as they were, on one hand, validated by multiple approaches (RNA-seq, qRT-PCR and GENT2 microarray database); on the other hand, they were also confirmed by multiple datasets and diverse cohorts (LCE meta-analysis database). This gene signature group contains a modulator of innate immune responses (IFN-I signal-induced gene IFI27) [34, 35], a hematopoietic transcription regulator (GATA2) [36], a metabolism mediator (ANKRD22) [37, 38], and a transcription factor involved in differentiation control (LHX4) [39], which is consistent with the following findings that immunological and metabolism mechanisms are enriched in post-treatment and responder patients (Figure 6 A & 6B). Moreover, several of them were proven to regulate the oncogenesis of lung cancer or respiratory diseases. For instance, ANKRD22 as we found down-regulated in NSCLC patients after therapy, was reported to promote the progression of NSCLC by enhancing cell proliferation [40]. In the present study we showed HBG1/2, as coding genes for hemoglobin subunit gamma, were up-regulated after treatment and also enriched in normal tissue comparing to tumor tissue. This finding is supported by study that stated HBG1/2 expression is upregulated in normal lung tissue by smoking and is downregulated again when tumors form in smokers [41]. More importantly, a recent report also determined HBG1 was consistently upregulated in PBMCs of patients with stage III/IV NSCLC from the 2nd anti-PD-1 antibody treatment through to the 5th treatment [42]. This observation seems dependent on stage and histology as both protein and mRNA levels of HBG1/2 were found down-regulated in stage I NSCLC adenocarcinomas [43]. Together, these findings suggest that the gene signature we identified and investigated in the present study could potentially act as a novel biomarker for effective patient selection.
Next round of comparison (res v non-res) in RNA-seq profiles identified DEGs with diverse function and roles. Three of the top 10 DEGs are histone family genes (H3C10/HIST1H3H, H2BC17/HIST1H2BO, H4C8/HIST1H4H) (Figure 4B & Supplementary Table S1b). A couple of histone genes have been reported to be hypermethylated in lung cancer and associated with oncogenesis in a wide range of cancers, suggesting histone gene could act as a potential universal-cancer-only marker [44]. H3C10 was previously reported up-regulated in breast cancer and its high expression was associated with poor overall survival [45]. H3C10 is rarely studied and its clinical relevance in lung cancer remains unknown. Here we searched two large tissue databases covering >50 studies and it turned out that genes H3C10 and DLG5 are both significantly decreased in normal tissue as compared to tumor tissue (Figures 4D, 4E & 5A). Our survival meta-analysis also showed both genes are significantly associated with poor prognosis of lung cancer patients (Figures 5B-C). Notably, the result of DLG5 is not quite in agreement with an earlier study that claimed DLG5 is down-regulated in SqCC tissue and indicated a favorable prognosis [46]. Nevertheless, the observation that DLG5 is down-regulated in normal tissue is agreement with our original finding that DLG5 was down-regulated in responders as compared with non-responders (Figures 4B). Given the consistent results across multiple platforms and comparisons, we concluded that blood-based gene signatures may have potential predictive value in NSCLC.
In the last part of this study, we sought to understand the transcriptomic profiles from a different perspective that involves high-level representations, such as pathways, gene ontologies and gene sets. KOBAS-i as utilized in our analysis is an intelligent prioritization tool that covering multiple pathway databases (KEGG Pathway, Reactome and PANTHER) and GO databases [47]. It turned out the most top-ranked pathways retrieved from our DEG lists were mostly assigned to pathways collected in Reactome (Figure 6B & 6D upper). Reactome is known to be composed of less broad but more detailed terms as compared to KEGG [48]. In agreement with that, our over-representation analysis on post v pre comparison yielded a complex cluster network showing the significant enriched terms were connected and overlaid (Figure 6A). This result demonstrated the pathways or GO associated with immune system (C1 & C5) and metabolism (C2 & C3) were significantly enriched in post-treatment subjects. In contrast, analysis on res v non-res comparison displayed a lot more metabolism regulatory terms were enriched (such as “O2/CO2 exchange”), suggesting metabolic regulations or metabolic products may play key roles in driving the outcome of anti-PD-1 blockade treatment (Figure 6C). Our findings provided additional insights to the growing body of evidence documenting that combination of metabolic and immune biomarkers in lung cancer is quite valuable for responder prediction [27, 49]. These signaling cascades identified here are also of interest from a therapeutic perspective, combining PD-1 checkpoint inhibitors with a broad range of clinically active partners is a highlighted strategy with potential importance [50, 51].
One limitation of the pathway analysis was only minimum directional effect (absolute values for z-score<2) was observed in the individual pathway, which is probably due to the complex interactions between immune components and regulators within each pathway. To overcome this issue, quantitative analysis was performed to evaluate the directional impact on immunological gene sets and circulating leukocytes. In consistent with the pathway analyses above, the gene set enrichment analysis also highlighted a couple of metabolism-related gene sets (GSE9006) uniquely in responders (Figure 7A, blue), in addition to gene sets representing immune cell subset (GSE22886, GSE3982 and GSE18893) (Figure 7A, purple). It is not surprising a broad- spectrum of leukocyte subsets (Monocytes, DCs, Th2 cells and B cells) were up-regulated or down-regulated (Treg cells) in the responder patients, which was also validated by our immune cell profiling analysis (Figure 7C). This finding is strongly supported by several recent studies and not limited to lung cancer [52–55]. Several key lymphocyte subsets (higher levels of CD4- and CD8-T cells; lower levels of NK cells) measured in blood samples at baseline were correlated with overall survival as well as clinical response in NSCLC patients received anti-PD-1 antibody treatment [53]. Similarly, the cellular signatures such as M1 macrophage and peripheral T cell showed better predictive performance than PD-L1 immunohistochemistry, tumor mutation burden, or tumor-infiltrating lymphocytes in NSCLC [54].
It is important to point out the in silico leukocyte deconvolution analysis (AImmune) presented here only strengthened the conclusions as drawn by other studies but also proposed a novel leukocyte enumeration approach that designed for circulating blood samples, instead of tissue samples as reported by other studies. We have previously used this approach in analyzing PBMC samples from patients with lung infection [30] and demonstrated the abundance of certain leukocyte subsets was lung disease severity (unpublished data). Similar immune profiling technologies have been employed to enumerate tumor-infiltrating immune cells (e.g. TIMER [56] and CIBERSORT [57]). These methods were recently upgraded taking advantage of the fast- growing single cell RNAseq data and well-developed modern machine learning algorithms (e.g. TIMER 2.0 [58]; CIBERSORTx[29]). Here for the first time, we reported an implementation of modified AImmune approach. With the existing gene markers tested and validated in our previous study [30], new gene markers adapted from PBMC-based scRNAseq data of NSCLC patients and healthy donor are now included in AImmune (Figure 7B & Supplementary Figure S5). We believe our results suggest alternative technology option in terms of evaluating leukocytes as blood-based biomarkers for immunotherapy.
In summary, the integrated analyses employed in the current study identified blood-based molecular and cellular signatures that are associated with responsiveness of ICI plus chemotherapy as well as the overall survival. Furthermore, the present study demonstrated that immunological and metabolic pathways may prove a useful panel of targetable biomarkers to improve patient selection and prognosis prediction. Despites its insights, the retrospective and preliminary nature of this small cohort study should be highlighted. Given the high molecular heterogeneity among different NSCLC subtypes, the number of patients sequenced in this study might be too small to provide strong statistical power or drawn informative conclusions. It is encouraged that our current findings and results should be further confirmed by a larger cohort with long-term follow-up and time series analysis.
Method
Study cohort and sample collection
This study was approved by the local Ethics Committee and the Institutional Review Board of Northwest University (approval number: 200402001) and all patients provided written informed consent. This retrospective study collected 42 whole blood samples from 30 patients visited Tianjin Cancer Hospital from 2020 to 2021 (Figure 1). All patients were diagnosed with non- small cell lung cancer (Table 1) and under treatment of anti–PD-1 monoclonal antibody Tislelizumab [59, 60], in combination with standard chemotherapy according to the histology.
Overall study design and data analytical flow
Transcriptomic profiling (RNA-seq) was performed on mRNA samples in two comparison groups: 1) post-treatment (post) versus (v) pre-treatment (pre) and 2) responders versus non- responders (Figure 1). The data analytical flow started by multiple sequential analysis at gene level, such as RNA-seq processing and real-time quantitative PCR (qRT-PCR) validation, which were followed by investigation of differential expression genes (DEGs) in tissue database (Gene Expression database of Normal and Tumor tissues 2, GENT2) [61] and meta-analysis of differential gene expression and expression-survival association in lung cancer database (Lung Cancer Explorer, LCE) [41]. The DEG data produced from the gene-level analyses were then submitted for further computational analyses in higher-level representations, such as canonical pathways and functional gene ontologies (KEGG Orthology Based Annotation System- intelligent version, KOBAS-i) [47], immune-related gene sets (Gene Set Enrichment Analysis, GSEA) [62, 63] and immune cell subsets (AImmune).
AImmune analysis and computational analysis
A novel in silico leukocyte deconvolution method, named AImmune, is a computational approach developed by integrating our established immune cell profiling [30] with published single cell RNAseq data obtained from NSCLC PBMCs (1071 qualified cells from one patient, GSE127471) and healthy PBMCs (8369 and 7687 cells from two donors, 10X Genomics). Briefly, with the additional marker genes included, more than 30 candidate marker genes for each cell subsets in leukocytes (CD4-T cells, CD8-T cells, B cells, Monocytes, DCs, NK cells and NKT cells) were selected based on their expression patterns across immune cell subsets [57] (Figure 7B). The pairwise similarity statistic of all cell subsets (Figure S4) was computed between all pairs of the candidate marker genes within the normalized RNA-seq profiles (FPKM) from whole blood samples. Using the criteria (average Pearson correlation factor >0.60, p<0.01), 10-20 selected marker genes were identified as our final marker genes. The raw cell abundance score was calculated as the sum of the simple averages of the marker genes’ log2 expression, which allows comparison of cell composition across subject groups. Monte Carlo simulation, as previously used in proteomics analysis and produce scores for peptide [64], was employed here to estimate the relation between the raw cell abundance score and generated a new dataset (n=20 in each group) with the same distributions and correlations as the raw scores. This approach also tested a novel deconvolution model (unpublished) built by DNN (deep neural networks) algorithms and then trained by pseudo-bulk samples obtained by randomly subsampling of published scRNA-seq data [65]. Machine learning-based feature extraction (marker gene selection) was integrated for model optimization. Most of the computational analysis procedure was coded by common Python packages, Monte Carlo simulation was performed by R package faux; scRNA-seq data was processed by R package scanpy; machine learning model was developed and tested with Python library Tensorflow. All computational analysis were performed and visualized using R version 3.6.1 or Python version 3.7.9.
RNA isolation, transcriptomic profiling, and qRT-PCR
Total RNA was isolated using Trizol (Invitrogen, Waltham, MA, USA) and the purity and concentration were verified using a NanoDrop ND-1000 instrument (ThermoFisher Scientific, Waltham, MA, USA). The integrity of the RNA was assessed by a 2100 Bioanalyzer gel image analysis system (Agilent, Santa Clara, CA, USA). Only RNA samples with an integrity number of 7 or higher were processed for RNA-seq. At least 100 ng mRNA per sample was submitted for library preparation.
Qualified RNA samples were then enriched and synthesized into two strand cDNA for library preparation. RNA-seq libraries were constructed using the TruSeq RNA Sample Prep Kit (Illumina, San Diego, CA, USA). The libraries from qualified RNA samples were sequenced in the 150 nt paired-end mode on an Illumina HiSeq 6000 platform at Novogene Bioinformatic Technology (Tianjin, China). After quality filtering (FastQC, quality value >5), over 30 billion clean reads were obtained in each library and then used for down-stream analysis.
PCR primers were designed top-ranked genes (10 for each comparison) as obtained by differential gene analysis (Supplemental Table S3). Real-time quantitative PCR was performed in real-time PCR systems (Bio-Rad, Hercules, CA, USA). The 2–ΔΔCT method for relative gene expression analysis was performed to calculate the mRNA levels of individual gene. Two or three replicates were measured for each sample.
Bioinformatics and statistics analyses
All raw RNA-seq reads were filtered by R package trim_fastq to remove adapters, rRNA and low-quality reads. The QC criteria included: removing bases below Phred quality 20, containing over two “N”, or shorter than 75. The output reads were then indexed by aligner STAR and mapped to reference genome by BAM. Normalized read counts were generated and compared between groups to generate DEGs using R package DESeq2. Another R package countToFPKM was employed to produce FPKM for AImmune analysis. Genes that displayed at least two-fold difference in gene expression between comparison groups (false discovery rate [FDR]<0.05) were considered significant DEGs and carried forward in the analysis. Differentially expressed RNAs were illustrated as a volcano plot. Hierarchical clustering was performed to show the gene expression patterns and similarities among samples. PCA was performed to cluster subjects based on the differentially expressed transcripts. GSEA analysis was carried out by searching the established MSigDB gene-set collections (C7). Differences of mRNA levels and cell composition scores were evaluated using independent t-tests or paired t-tests if pairwise samples were given. A p value of <0.05 was considered statistically significant. Bioinformatics and statistics analyses were performed and visualized using R version 3.6.1 or Python version 3.7.9.
Funding
This work was supported by National Natural Science Foundation of China (82071863).
Author contributions
Conceptualization, Xi Zhang; Data curation, Xi Zhang, Rui Chen, Wenqing Li, Guodong Su, Wuhao Huang and Shengguang Wang; Formal analysis, Xi Zhang, Rui Chen, Wenqing Li and Runpeng Han; Funding acquisition, Xi Zhang; Investigation, Xi Zhang, Rui Chen, Wenqing Li and Shengguang Wang; Methodology, Xi Zhang, Runpeng Han, Yuyan Xiong and Shengguang Wang; Project administration, Xi Zhang, Wenqing Li and Shengguang Wang; Resources, Xi Zhang, Rui Chen, Runpeng Han, Wuhao Huang and Shengguang Wang; Software, Rui Chen; Supervision, Xi Zhang, Yuyan Xiong and Shengguang Wang; Validation, Xi Zhang; Visualization, Xi Zhang; Writing – original draft, Xi Zhang; Writing – review & editing, Xi Zhang, Yuru Liu, Yu Cai, Yuyan Xiong and Shengguang Wang.
Data Availability
All data produced in the present study are available upon reasonable request to the authors
Conflicts of Interest Statement
The authors have declared that no conflict of interest exists.
Acknowledgments
We thank all patients who donated blood.