Expression profile analysis of COVID-19 patients

The course of COVID-19 is determined by various factors. Studies worldwide have shown a correlation between changes in the expression profile and the severity of the disease. Thus, an in-depth study of differentially expressed genes will allow a more detailed investigation of the metabolic changes occurring in the background of coronavirus infection. The technique of RNA sequencing and subsequent bioinformatics analysis is suitable for such research tasks. In our study, we compared groups of samples from patients with mild and severe disease course and identified a number of differentially expressed genes. These genes are involved in metabolic pathways responsible for immune response, signaling, intercellular communication, metabolism of various compounds etc. We then identified master regulators whose function and role in pathways enriched by differentially expressed genes makes them potential targets for biochemical and meta-studies.


INTRODUCTION
Work on the peripheral transcriptomic signatures of COVID-19 is being conducted by many research groups.Many different aspects of the disease are being investigated.For example, in a study by Chinese scientists on the peculiarities of protein ubiquitination in COVID-19, comparing gene expression in peripheral blood mononuclear cells of 4 patients with severe disease and 4 healthy controls, 268 differentially expressed genes were identified.As a result, we were able to identify 6 transcription factors and 2 mi-croRNAs that are key factors in the regulation of ubiquitination in patients with severe COVID-19 1 .A study of differentially expressed matrix RNAs, microRNAs and long non-coding RNAs by researchers from China identified 25,482 mRNAs, 23 microRNAs and 410 long non-coding RNAs whose expression levels differed between COVID-19 patients and control donors.mRNAs that are overexpressed in samples of COVID-19 patients are mainly involved in antigen processing and endogenous antigen presentation, positive regulation of T-cell-mediated cytotoxicity, and positive regulation of gamma delta T-cell activation.mRNAs with reduced expression are mainly involved in glycogen biosynthesis 2 .
A study of the blood transcriptome of people of different ages in the context of measuring the expression level of genes whose products interact with SARS-CoV-2 viral structures revealed five genes that change their expression with aging.They are involved in immune response, inflammation, cellular component and cell adhesion, and platelet activation/aggregation.Moreover, the expression profile changed differently in men and women with aging 3 .In general, the transcriptomic signature of COVID-19 overlaps to a large extent with that of influenza and other acute respiratory infections.Studies of dynamic changes in transcriptome of COVID-19 patients are beginning to appear.The data from these studies show that the blood transcriptome undergo dramatic and consistent changes as in the early stage of recovery.And even months after clinical recovery, gene expression levels does not return to the values of a healthy person 4 .The development of interpretable transcriptome panels for profiling the immune response to SARS-CoV-2 infection is already underway, with target genes categorized into groups: immunologic relevance, role in disease progression, and interaction with SARS structures 5 .
Transcriptomic analysis can be a powerful approach to assess the molecular response of the host and can provide valuable information both on the pathophysiology of COVID-19 and find master regulators that could be potential targets for effective therapies.
The aim of this study was to analyse differential gene expression to identify a set of regulatory genes affecting key molecular and cellular pathways involved in the pathogenesis of COVID-19.

Participant Characteristics
The participants of the study were the patients of the infectious disease department of the St. Petersburg State Health Care Institution City Hospital No. 40, Kurortny District who were admitted for treatment with coronavirus infection (confirmed by polymerase chain reaction).The studied patients were divided into two groups according to the severity of the disease (group 1 -mild and moderate course, group 2 -severe and extremely severe course of the disease).Age-sex structure of the groups is presented in Table 1.  .

Results of differential expression analysis
We obtained the following results analysing blood samples from patietns divided by severity.4734 genes passed the 0.01 statistical significance level.806 genes were upregulated and 3925 genes were downregulated in group 2 relative to group 1. Upregulated genes with the lowest p-value are listed in Table 2. Downregulated genes with the lowest p-value are listed in Table 3. GO enrichment analysis showed that upregulated genes are grouped in pathways responsible for adaptive immune response, signalling, intercellular communication, and several others (Table 4).Master regulator analysis showed the following results: CDH2, CXCL9, KIT, IL13, EPHA4, FLT3, DUSP6, FGFR2, IL4 for upregulated genes; RHOH, NFATC3, FXN, ADRB2, NTRK1, NAA30, ZC3H12D, MAF, KLRD1 for downregulated genes.

RNA extraction and sequencing
RNA was isolated manually from blood samples using the phenol-chloroform extraction method.Library preparation was performed using KAPA RiboErase HMR reagents (Roche), RNA Hyper Prep (Roche), KAPA Universal Adapter (Roche), KAPA UDI Primer Mixes (Roche).Conversion was performed using High-Throughput Sequencing Primer Kit (App-C) and MGIEasy Universal Library Conversion Kit (App-A) (MGI) reagents.Sequencing of the resulting libraries was performed on an MGISEQ-2000 sequencer using a 100 base pair-end length read on a DNBSEQ-G400 High-throughput Sequencing Set (PE100, 360 GB) cell (MGI).

Bioinformatics analysis
Quality control was performed for raw reads using the FastQC program.Alignment to the human genome hg38 version was performed using the STAR aligner with standard settings.Counting reads and gene annotation were performed with the featureCounts program.

Statistical analysis
The nonparametric Mann-Whitney test was used to calculate statistically significant differences in gene expression, and the log fold change (logFC) was used to measure the differential expression.Calculations were performed in the R software package.GO pathway enrichment analysis and the master regulator analysis were performed on the ge.genexplain.complatform.
All patients signed informed voluntary consent.The study was approved by the expert ethics board of the St. Petersburg State Health Care Institution "City Hospital No. 40" (protocol No. 171 dated May 18, 2020).Biomaterial from the biobank collection in St. Petersburg State Health Care Institution "City Hospital No. 40" was used to fulfil the study objectives.

Table 1 :
Table1.Age-sex structure of the groups

Table 4 :
Table 4.Top GO terms, enriched with upregulated genes.
Downregulated genes are concentrated in molecular pathways responsible for the processes of cellular nitrogen and heterocycle compound metabolism, gene expression, etc. (Table5).

Table 5 :
Table 5.Top GO terms, enriched with downregulated genes.