Laboratory validation of an RNA/DNA hybrid tagmentation based mNGS workflow on SARS-CoV-2 and other respiratory RNA viruses detection
=====================================================================================================================================

* Feili Wei
* Yanhua Yu
* Zhongjie Hu
* Rui Wang
* Xianghua Guo
* Haiying Jin
* Shan Guo
* Yabo Ouyang
* Ying Shi
* Ronghua Jin
* Dexi Chen

## Abstract

**Background** Acute respiratory infection (ARI) caused by RNA viruses is still one of the main diseases all over the world such as SARS-CoV-2 and Influenza A virus. mNGS was a powerful tool for ethological diagnosis. But there were some challenges during mNGS implementation in clinical settings such as time-consuming manipulation and lack of comprehensive analytical validation.

**Methods** We set up CATCH that was a mNGS method based on RNA and DNA hybrid tagmentation via Tn5 transposon. Seven respiratory RNA viruses and three subtypes of Influenza A virus had been used to test CATCH’s capabilities of detection and semi-quantification. Analytical performance of SARS-CoV-2 and Influenza A virus had been determined with reference standards. We compared accuracy of CATCH with quantitative real time PCR by using clinical 98 samples from 64 COVID-19 patients.

**Results** We minimized the library preparation process to 3 hours and handling time to 35 minutes. Duplicate filtered RPM of 7 respiratory viruses and 3 Influenza A virus subtypes were highly correlated with viral concentration (r=0.60, p<0.001, Spearman correlation test). LOD of SARS-CoV-2 was 39.2 copies/test and of Influenza A virus was 278.1 copies/mL Comparing with qR-TPCR, the overall accuracy of CATCH was 91.4%. Sensitivity was 84.5% and specificity was 100%. Meanwhile, there were significant difference of microbial profile in oropharyngeal swabs among critical, moderate patients and healthy controls (p<0.001, PERMANOVA test).

**Conclusion** Although further optimization is needed before CATCH can be rolled out as a routine diagnostic test, we highlight the potential impact of it advancing molecular diagnostics for respiratory pathogens.

Key words
*   RNA viruses
*   COVID-19
*   mNGS
*   Tn5 transposon
*   analytical validation

## Background

Respiratory tract infection, as a global burden on public health, has caused extensive morbidity and mortality worldwide in the past decades. It is 3rd leading causes of death among all ages by 20151. In most cases of severe acute respiratory tract infections, RNA viruses are pathogenic agents2-4. For example, the novel coronavirus SARS-CoV-2, which is pathogen of COVID-19, has caused over 3 million people infected and thousands death from December 2019 until now5,6. Traditional nucleotide analysis methods for RNA viruses, such as quantitative reverse transcription PCR (qRT-PCR), are sensitive and specific, but only focused on pre-defined species or subtypes.

Metagenomic next-generation sequencing (mNGS), which is well studied on pathogen detection recently, offers a culture-free and nucleotide sequence independent method that eliminates the need to define the targets for diagnosis beforehand7. However, most of those protocols have time-consuming steps and complicated manipulation including RNA extractions, reverse transcription, second-strand complementary synthesis, preamplification/isothermal amplification, adapter ligation and PCR amplification. It not only takes a long time to wait for diagnostics results but also not easy for clinician to use in a large scale. Bacterial transposase Tn5 has been employed in next generation sequencing by cutting double-stranded DNA (dsDNA) and ligate the resulting DNA ends to specific adaptors8-10. It has been successfully used in mNGS for pathogen detection in low input specimens11. Furthermore, it also can skip second-strand complementary synthesis by direct tagmentation of RNA/DNA hybrids in scRNA-seq and SARS-CoV-212-14. But the lack of method standardization and validation of these workflows encumbers the ability to assess the variability of results generated by different laboratories, leading to uncertainty in the results7,11,15,16. Hence, laboratory validation including low limit of detection (LOD), precision, stability, effects of interference and accuracy for RNA viruses in one single assay is essential for implementation of mNGS for routine pathogen detection in clinical diagnostic laboratories.

In this study, we optimized metatranscriptomic protocol based on RNA/DNA hybrid tagmentation for detecting respiratory RNA viruses. We call this method as **CATCH** (pathogeni**C** rn**A**/dna hybrid **T**agmentation te**CH**nology). We aimed to build up a rapid and accurate method for mNGS to be used as a broad diagnostic tool for viral respiratory diseases with the potential for pan-pathogen detection. Despite validation of laboratory performance, we also determined its sensitivity and accuracy compared to that of existing diagnostics method–qRT-PCR, using clinical samples from hospital patients during these COVID-19 pandemic. Further optimization is needed before CATCH can be rolled out as a routine diagnostic test, but we highlight the potential impact of it advancing molecular diagnostics for respiratory pathogens.

## Methods

### CATCH workflow for respiratory RNA viruses detection

RNA was extracted using QIAamp Viral RNA Mini Kit (Qiagen, Cat.No. 52904) following the manufacturer’s instructions. Internal RNA controls, consisting of an RNA phage (Escherichia coli bacteriophage MS2, ATCC15597-B1, Hecin Scientific, Inc), was added into all the samples before RNA extraction at a concentration of 1×103 copies/mL, in approximately 10-fold of its LOD11. DNA was then removed through Denature buffer and DNase in Repli-g WTA kits (Qigen, Cat.No.150063). After removal of the DNA, RNA/DNA hybrids were synthesis by SuperScriptTM IV First-Strand Synthesis System (Thermo Fisher, Cat.No. 18091050). RNA/DNA hybrids tagmentation and indexing-PCR protocol are optimized according to Nextera XT DNA Library Prep Kit (Illumina, Cat.No. lFC-131-1096, Supplementary Protocol). For pooling, each library was quantified individually using the Qubit dsDNA HS Assay Kit (Thermo Fisher, Cat.No. Q32851), followed by combining equimolar concentrations of DNA libraries. The size distribution of the combined pools was determined using the High Sensitivity DNA kit (Agilent, Cat.No. 5067-5583) on an Agilent 4200 Bioanalyzer. Library pools were then sequenced on Illumina NextSeq 500 platform with SE (Single-End) 75 cycles sequencing strategy to generate a minimum of 10~20 million reads for each library.

### Bioinformatics workflow and data processing

We first used software fastp17 (v0.19.5) to filter low-quality reads and remove adapter with parameters: -q 20 -c -l 50), low complexity reads removal by Komplexity (-F, -k 8, -t 0.2, version: Nov-2019), host removal by bmtagger [ftp://ftp.ncbi.nlm.nih.gov/pub/agarwala/bmtagger](http://ftp://ftp.ncbi.nlm.nih.gov/pub/agarwala/bmtagger)), and ribosomal reads removal by SortMeRNA18 (version:2.1b). Next, we applied Kraken219 (version 2.0.8-beta, parameters: --threads 24 --confidence 0.2) to assign microbial taxonomic assignment against the large NT reference database (version: Nov-2019) combining with current SARS-CoV-2 reference genome (accession ID: [NC\_045512.2](http://medrxiv.org/lookup/external-ref?link\_type=GEN&access\_num=NC\_045512.2&atom=%2Fmedrxiv%2Fearly%2F2020%2F05%2F14%2F2020.05.12.20099754.atom)). Duplicate filtered RPM of each virus were calculated by the follow steps: 1) Extracting reads annotated as targeted viral species except for SARS-CoV-2. 2) Aligning all reads to targeted reference genome with tblastx20 (-max\_target_seqs 1 -max_hsps 1 -evalue 1e-5, version:). Alignment with beyond 90% identity of reference sequence and 90% length of query reads are considered. 3) Reads with same start position were marked as duplicated reads and discarded. Rest of reads were used to calculate RPM (reads per millions) by normalizing sequencing data size. For SARS-CoV-2, reads belong to *Betacoronavirus* are extracted. Reference sequences of all *Betacoronavirus* were used for blastn20 alignment (-max_target_seqs 1 -max_hsps 1 -evalue 1e-5, version:). Best hits of SARS-CoV-2 have been kept for following analysis. The following steps were same with other viruses. For classification of Influenza A virus subtypes, TAEC21 had been used with default parameter and influenza viruses database from NCBI22. We defined positive detection of bacteria, fungus and viruses by the following threshold criteria: 1) more than 3 non over lapping reads from distinct genomic regions for viruses; 2) 10-fold higher duplicate filtered RPM than NTC. The rest of microbial taxon were considered as contamination and discarded.

### Study design of Respiratory RNA viral species and subspecies detection by CATCH

To study the performance of CATCH on detection of respiratory RNA viral species and subspecies, we designed a panel with two Reference Standards kit. The SARS-COV-2 Standard (Lot No. GBW(E)091090) was donated by the National Institute of Metrology, China and the 2nd National Reference Panel for Influenza A Viral Nucleic Acids Detection Kit (Lot No.370051-201801) was acquired from the National Institutes for Food and Drug Control, China. In brief, all of 7 Respiratory RNA viruses and 3 Influenza A virus subspecies standard reference were first tested its highest dilution fold (similar with LOD) except for SARS-CoV-2 and Influenza A virus (A/2009/H1N1) which had exact viral concentrations. We defined these concentrations as baseline for the other 6 RNA viruses and 2 Influenza A virus subspecies. For each virus or subspecies, we improved its concentration to 10-fold comparing with baseline. Each trial was repeated three times. For SARS-CoV-2 and Influenza A virus (A/2009/H1N1), we set 102copies/test and 104copies/mL as baseline. Each trial was repeated three times.

### Evaluation of analytical performance characteristics

**1) limit of detection:** SARS-COV-2 RNA standard was spiked into RNA extracted from oropharyngeal swabs VTM matrix of asymptomatic donors at concentrations ranging from 10 to 104 copies/test. Influenza A virus (A/2009/H1N1) reference standard was spiked into oropharyngeal swabs VTM matrix at concentrations ranging from 102 to 105 copies/mL. LOD were calculated in R (version 3.5.1) using probit regression analysis23 following approved guidelines of clinical and laboratory standards institute with 3 to 20 replicates performed at each concentration. **2) Precision:** External PC and NTC samples were analyzed over 5 independent mNGS runs (inter-run reproducibility) and as 3 independent sets over 1 run (intra-run reproducibility) and evaluated for quality control metrics and viral detection using established thresholds. **3) Interference:** to evaluate the effect of human host background on mNGS assay performance, we spiked SARS-COV-2 and Influenza A virus (A/2009/H1N1) standards into low (1 × 104 cells/mL), medium (1×105 cells/mL), and high (1×106 cells/mL) titers of human A549 cell. **4) Stability:** to assess stability, 3 replicates of external PC were placed in a refrigerator at 4°C for 0, 3 and 6 days. After these manipulations, mNGS libraries were generated from the samples and sequenced.

### Patients and clinical samples

Residual RNA samples from patients at Beijing Youan Hospital with laboratory confirmed COVID-2019 (diagnosed by qRT-PCR) and those with suspected COVID-2019 but testing negative. 64 patients were enrolled in this study according to the 7th guideline for the diagnosis and treatment of COVID-19 from the National Health Commission of the People’s Republic of China. Residual RNA samples (n=98) were stored at Youan prior to testing. mNGS test results were compared to qRT-PCR testing results. Sensitivity and specificity of the mNGS assay were calculated relative to prior qRT-PCR test. Moreover, oropharyngeal swabs from 15 healthy volunteers were collected and used as healthy controls.

### qRT-PCR assays

We performed qRT-PCR assay using the ABI ViiA7 (Applied Biosystems) instrument according to the CDC EUA-approved protocol. SARS-CoV-2 was detected with the following primers: For N gene, the forward 5’CACATTGGCACCCGCAATC3’, reverse 5’GAGGAACGAGAAGAGGCTTG3’ and the probe fam-5’ACTTCCTCAAGGAACAACATTGCCA3’-BHQ1. For ORF1ab gene, the forward 5’ GTGARATGGTCATGTGTGGCGG3’, reverse 5’ CARATGTTAAASACACTATTAGCATA3’ and the probe VIC-5’CAGGTGGAACCTCATCAGGAGATGC3’-BHQ124. All oligos and probes were acquired from Sangon Shanghai.

### Statistical analysis

Mann Whitney U test or Kruskal Wallis rank sum test was used for continuous variables that do not follow a normal distribution. A comparison of microbiota was done by Perm-ANOVA test. Spearman method was used for correlation test between Ct values/viral concentration and duplicate filtered RPM.

### Ethics statement

This study was reviewed and approved by the Ethics Committee of Beijing Youan Hospital, Capital Medical University.

## Data Availability

The raw sequence data of samples in this study, after removal of human reads, will be deposited to Genome Warehouse in National Genomics Data Center and the sequence Read Archive database.

## Conflict of interest

The authors declare no conflict of interest.

## Funding

This work was supported by the COVID-19 Key Technology Research and Development Funding of Beijing Hospital Authority, National Science and Technology Major Special Program of the 13th Five-Year Plan (2018ZX10732202-004-003), and Beijing institute of Hepatology Reform and Development Project (Y-2020HZ-2).

## Data available

The raw sequence data of samples in this study, after removal of human reads, will be deposited to Genome Warehouse in National Genomics Data Center and the sequence Read Archive database.

## Author contribution

D. C, Z. H and R. J conceived the project; R.W, X. G, H. J and S. G conducted experiments; Y. O.Y, Y. S and Y. Y analyzed the data; F. W wrote the manuscript with the help from all other authors.

## Results

### CATCH workflow for clinical RNA viruses detection

To further adaptation of clinical uses, we minimized mNGS library preparation time and steps by upgrading reverse transcriptase from Superscript III to Superscript IV (Thermo Fisher), combing extension and adaptor-PCR in one step and reducing PCR extension time within each cycle. We minimized the total time on library preparation to 3 hours and on-hands time to 35 min (Supplementary Protocol, Figure 1A). For confirmation of this workflow on RNA viruses detection, we tested 7 respiratory RNA viruses which were common in CAP patients including SARS-CoV-2, Influenza A virus (A/2009/H1N1), Influenza B virus (B/Vitoria), respiratory syncytial virus, parainfluenza virus, mumps virus and rubella virus. All 7 RNA viruses were detected though this workflow (Figure 1B). Except for exploring targeted virus detected or not, we also tested performance of our workflow on semi-quantification of each RNA virus by increasing concentration of each virus to ten times comparing with baseline separately and calculating fold-change of duplicate filtered RPM (Methods). We assumed that we could observe a significant increase of viral signals in mNGS assay if our workflow can quantitatively measure concentration of RNA viruses in spike-in samples. The fold-change of duplicate filtered RPM are expected to be 10X. If then it indicates a strong capability for quantification of our workflow. We found that viral signals of these 7 RNA viruses were ranged from 0.18[0.04, 0.33] to 177.06[141.03, 213.09] as baseline and fold-change comparing with baseline were ranged from 5.76[2.96, 8.55] to 19.05[9.82, 28.29] (Table 1). As we thought, all of 7 RNA viruses had a significant fold change in its own trials and duplicate filtered viral RPM generated by CATCH are significantly correlated with viral conentraion (r=0.60, p=7.83e-16). However, parainfluenza virus and mumps rubulavirus showed a lower fold change than expectation (Mann Whitney U test, p=0.029 and p=0.022), while rubella virus showed significantly higher fold-change than expectation (Mann Whitney U test, p=0.049, Figure 1B). Further, to explore the performance of distinguishing and quantifying related sub-species, we tested three important subtype of Influenza A virus: pandemic A/2009/H1N1, A/H3N2 and A/H7N9. All these three influenza A viruses can be detected. However, except for H3N2, the other two subtypes had lower fold change of duplicate filtered RPM than our expectation (Mann Whitney U test, p=0.002, p=0.29 and p<0.001, Figure 1C, Table 1).

View this table:
[Table 1.](http://medrxiv.org/content/early/2020/05/14/2020.05.12.20099754/T1)

Table 1. 
Detection and quantification performance of 7 RNA viruses and 3 influenza A virus subtypes by CATCH

![Figure 1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/05/14/2020.05.12.20099754/F1.medium.gif)

[Figure 1.](http://medrxiv.org/content/early/2020/05/14/2020.05.12.20099754/F1)

Figure 1. 
Schematic of RNA/DNA hybrid tagmentation workflow and its applications on RNA viruses detection. (A) Optimization of RNA/DNA hybrid tagmentation workflow. (B) Detection and semi-quantification of 7 typical respiratory RNA viruses. (C) Detection and semi-quantification of 3 typical Influenza A virus subtype. IAV: Influenza A virus, IBV: Influenza B virus, RSV: respiratory syncytial virus, PIV: parainfluenza virus, Mumps V: mumps virus and Rubella V: rubella virus.

### Analytical performance of Influenza A virus and SARS-CoV-2

To calculate 95% limits of detection (LOD), defined as the lowest concentration at which 95% of positive samples were detected, we evaluated SARS-CoV-2 and Influenza A virus separately at 5 different concentration levels, testing three to 20 replicates at each concentration. From 105 copies/mL to 500 copies/mL library produced 2,163.00[1,093.23, 3,224.10] duplicate filtered RPM mapping to Influenza A virus. For SARS-CoV-2, we got 1,533.67[1,015.48, 2,049.86] to 11[6.03, 15.97] duplicate filtered RPM from 104 copies/test to 50 copies/test (Figure S1A and B). Across the dilution series, duplicate-filtered RPM of both viruses were strongly associated with viral copies or titers (Spearman Correlation Test, p=1.41e10-11 and p=2.12e10-11). Using probit regression analysis, a 95% limit of detection is determined for SARS-CoV-2 and Influenza A virus (Table 2). Further, we evaluated the effects of interference from host cell on mNGS assay performance. Host cell at a level of more than 105 cells /mL resulted in a significant reduction in number of IC and virial signals of Influenza A virus comparing with no additional host cell (Mann Whitney U test, p=0.06, p=0.04 and 0.03, Figure S1C). Moreover, inter- and intra-assay reproducibility and stability of held in 4°C for 0,3,6 days are also performed (Table 2). We did not find any significant difference among Days 0, 3 and 6 (Mann Whitney U test, p=0.23 and p=0.53 and 0.55, Figure S1D).

View this table:
[Table 2.](http://medrxiv.org/content/early/2020/05/14/2020.05.12.20099754/T2)

Table 2. 
Performance characteristics for CATCH on Influenza A virus and SARS-CoV-2

### Retrospective validation on swabs specimens of COVID-19 patients using qRT-PCR and mNGS workflow

For evaluation of accuracy, a total of 98 RNA samples from 64 COVID-19 patients are tested using both mNGS assay and qRT-PCR. SARS-CoV-2 viral signals are present in 44 mNGS positive samples in 47 dual-gene positive qPCR results. At single-gene positive samples, 5 in 11 generated SARS-CoV-2 viral. No reads classified as SARS-CoV-2 are obtained from mNGS assay of 40 qPCR-negative samples. Besides, there were no other DNA or RNA viruses were considered as positive according our threshold in these samples. Overall, CATCH showed 84.5% sensitivity and 100% specificity compared to original clinical test results (Figure 3A). And if only consider dual-gene positive samples, the sensitivity will increase to 93.7%. On the other hand, we found viral duplicate filtered RPM of SARS-CoV-2 have strong correlation with Ct value of N gene (R2=0.754) rather than ORF1ab ((R2=0.326, Figure 3B and C). Although it inferred that CATCH might have 3’bias, we still found that CATCH had capability of RNA virus detection and semi-quantifying viral copies concentration in clinical samples.

### Microbial profile in COVID-19 patients and healthy control

To explore unique features of microbial profile in COVID-19 patients, we recruit another 15 healthy volunteers to collect their oropharyngeal swabs. In sum, we got 36,845.1[27,519.4, 46,170.7] microbial RPM in COVID-19 patients and 109,398.1[102,779.5, 116,016.8] microbial RPM in healthy control. Patients and healthy control can be clustered as three groups by top 50 microbial genus (Figure S2A). Jensen-Shannon divergence of all microbiome was used to calculate distance of three groups (Critical, Moderate and Healthy, Figure S2B). There was significant difference between COVID-19 patients (both critical and moderate) and healthy control (PERMANOVA test, p < 0.001). And between critical and moderate group, we also found a difference in microbiome of oropharyngeal swabs (PERMANOVA test, p = 0.013). To exclude batch effects on microbial profile data, we re-classified samples according to collection date and library preparation batches (Figure S2A). There were significant differences in clusters grouped by collection date and library preparation batches (PERMANOVA test, p < 0.001 and p < 0.001). But if only considering COVID-19 patients, we found less impact of collection date and library preparation on microbial profile (PERMANOVA test, p = 0.053 and p = 0.182). We found characteristic microbial taxon in healthy control such as *Streptococcus, Prevotella* and *Veilonella* that were common in human upper respiratory tracts (Figure S2A). For COVID-19 patients, we also found an enrichment of opportunistic pathogen such as *Candida* genus (Mann Whitney U test, p < 0.001, Figure S3A and C). For further confirmation of this result, we used culture methods to detect *Candida* genus via VITEK225. There were 2 in 4 ICU patients that were detected *Candida* genus positive.

## Discussion

As of today, acute respiratory infection (ARI) caused by RNA viruses is still one of the main diseases all over the world. According to the study of GLIMP26, EPIC4 and CAP-China27, viral CAP diagnosis was 37.2% in Asia, 23% in the United States and 39.2% in China. To improve our capabilities of etiological diagnosis in infectious disease, mNGS assay has been applied and studied by many researchers and is gradually developed to be a powerful tool in the past few years28. However, there were some challenges had to be overcome during mNGS implementation in clinical settings such as laborious wet-lab manipulations and lack of analytical validation following with a standard guideline7,28. In this study, we set up a rapid, ease-to-use and sensitive clinical mNGS workflow--CATCH based on RNA/DNA hybrids tagmentation via Tn5 for respiratory RNA viruses detection. Besides, we provide a series data of its analytical performance of SARS-CoV-2 and Influenza A virus. To our knowledge, this is the first study that testing comprehensive performance of mNGS method based on RNA/DNA hybrids tagmentation for pathogen detection. We highly consider that this newfound characteristic feature of Tn5 will help mNGS implementation in clinical settings. Thus, solid data of accuracy and robustness of CATCH in pathogen detection are required and important for its clinical uses.

Of many mNGS workflow for pathogen detection, capability for quantification is still a challenge for many reasons29-31. Although CATCH cannot provide a well-defined linearity of quantification for RNA viral detection, we still found that viral signals detected by CATCH (duplicate filtered RPM) are strongly correlation with viral copies concentration or titers both in spiked trials and clinical samples (Figure 1A and B, Figure 2B and C). We considered that CATCH can be used as a semi-quantification tool for many clinical uses and comparing with other detection methods. On the other hands, CATCH can distinguish related species even subspecies. Identification of different subtype of Influenza A virus are important for clinical diagnosis32.

![Figure 2.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/05/14/2020.05.12.20099754/F2.medium.gif)

[Figure 2.](http://medrxiv.org/content/early/2020/05/14/2020.05.12.20099754/F2)

Figure 2. 
Accuracy of mNGS workflow comparing with qRT-PCR. (A) 2X2 contingency tables comparing the performance of mNGS workflow relative to qRT-PCR and clinical testing. The composite qRT-PCR standards used are both gene are detected (left), both gene negative (middle) and single gene detected (right). (B) Comparation of qRT-PCR ORF1ab gene Ct value and mNGS viral signals. (C) Comparation of qRT-PCR N gene Ct value and mNGS viral signals.

In analytical performance test with reference standard, we first found that limits of detection of CATCH were 39.2 copies/test for SARS-CoV-2 and 278.1 copies/mL for Influenza A virus. The sensitivity of Influenza A virus was at the same level of magnitude described in previous mNGS study either at Illumina platform11 or Nanopore platform16. While for SARS-CoV-2, we suggest that small genome size (three partial genomic regions, 3kb) might be the reason that CATCH cannot achieve a sensitive result the same as qRT-PCR11,15. Second, our data showed that high host background was a fairly common limitation of mNGS. It inferred that negative findings of unbiased mNGS methods might be less useful for excluding infection in high host background samples11,16. We, third, explore accuracy of CATCH by clinical oropharyngeal swabs during COVID-19 pandemic. CATCH showed a well compatibility with concentration of input. Although there were 65.00% and 38.46% samples in qRT-PCR negative and positive samples cannot be measured a definite concentration (Figure S3), we still have a 100% success in library preparation (Data not shown). The overall accuracy of CATCH for SARS-CoV-2 detection relative to conventical qRT-PCR was 91.4%, with 84.5% sensitivity and 100% specificity. In dual-gene qRT-PCR positive samples, sensitivity increased to 93.7%. We suggest that integrity of viral RNA might impair pathogen detective efficiency of CATCH. Moreover, as we knew, CATCH or other RNA/DNA hybrid tagmentation method without optimization will have obvious 3’bias13. This feature might also decrease possibility to capture viral fragments in clinical samples. For further implementation, CATCH needs optimization for overcome this limitation. Last, we found that CATCH can detect enrichment of opportunistic pathogen in oropharyngeal swabs of COVID-19 patients comparing with healthy control. It indicated that we can expand application of CATCH in fungal or bacterial detection in the near future. We also found an interesting phenomenon that microbiome in oropharyngeal swabs among critical, moderate patients and healthy people can be grouped in three clusters. Although we excluded batch effects to some extent such as collection date and library preparation batch, we still need more clinical data to confirm that there were significant difference in microbiome of oropharyngeal swabs between COVID-19 patients and healthy people for the reason that microbial profile might be affect by many artificial factors33. But it inferred that CATCH have a strong potential on finding microbial biomarker for disease evaluation and estimation34-36.

In summary, technological advancements in library preparation methods, sequence generation and computational bioinformatics are enabling quicker and more comprehensive metagenomic analyses at cost-effectiveness level. We hope CATCH will be a routine implementation of clinical mNGS in patient care settings.

Figure S1. Analytical characteristics of SARS-CoV-2 Influenza A virus. (A) LOD of SARS-CoV-2. (B) LOD of Influenza A virus. (C) Interference test with A549 host cell on Influenza A virus and Internal control. (D) Stability test on Influenza A virus and Internal control.

Figure S2. Microbial profile in oropharyngeal swabs of critical, moderate patients and healthy volunteers. (A) Heatmap of Top 50 microbial genus of critical, moderate patients and healthy volunteers. Red color represents high relative abundance and black color represents low relative abundance. (B) PCoA of microbiome in oropharyngeal swabs. (C) Relative abundance of *Candida genus* in different group.

Figure S3. RNA concentration of clinical oropharyngeal swabs specimens after DNase treatment. (A) Distribution of RNA concentration of throat swabs. Points lower than LOD of Qubit are generated by random function. (B) Proportion of samples with RNA concentration lower than LOD of Qubit.

## Acknowledgement

We thank Dr. Bin Yang and Mr. Peizhi Li at Illumina China for providing a comprehensive suggestion on this project and assistance of high-throughput sequencing. The SARS-COV-2 Standard (Lot No. GBW(E)091090) was donated by the National Institute of Metrology, China.

*   Received May 12, 2020.
*   Revision received May 12, 2020.
*   Accepted May 14, 2020.


*   © 2020, Posted by Cold Spring Harbor Laboratory

This pre-print is available under a Creative Commons License (Attribution-NoDerivs 4.0 International), CC BY-ND 4.0, as described at [http://creativecommons.org/licenses/by-nd/4.0/](http://creativecommons.org/licenses/by-nd/4.0/)

## Reference

1.  1.Wang HD, Naghavi M, Allen C, et al. Global, regional, and national life expectancy, all-cause mortality, and cause-specific mortality for 249 causes of death, 1980–2015: a systematic analysis for the Global Burden of Disease Study 2015. Lancet 2016;388:1459–544.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S0140-6736(16)31012-1&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27733281&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F05%2F14%2F2020.05.12.20099754.atom) 

2.  2.van Boheemen S, van Rijn AL, Pappas N, et al. Retrospective Validation of a Metagenomic Sequencing Protocol for Combined Detection of RNA and DNA Viruses Using Respiratory Samples from Pediatric Patients. J Mol Diagn 2020;22:196–207.
    
    
3.  3.Parrish CR, Holmes EC, Morens DM, et al. Cross-species virus transmission and the emergence of new epidemic diseases. Microbiol Mol Biol Rev 2008;72:457–70.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoibW1iciI7czo1OiJyZXNpZCI7czo4OiI3Mi8zLzQ1NyI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIwLzA1LzE0LzIwMjAuMDUuMTIuMjAwOTk3NTQuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 

4.  4.Jain S, Self WH, Wunderink RG, et al. Community-Acquired Pneumonia Requiring Hospitalization among U.S. Adults. The New England journal of medicine 2015;373:415–27.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1056/NEJMoa1500245&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26172429&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F05%2F14%2F2020.05.12.20099754.atom) 

5.  5.Zhu N, Zhang DY, Wang WL, et al. A Novel Coronavirus from Patients with Pneumonia in China, 2019. N Engl J Med 2020;382:727–33.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1056/NEJMoa2001017&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F05%2F14%2F2020.05.12.20099754.atom) 

6.  6.Wu F, Zhao S, Yu B, et al. A new coronavirus associated with human respiratory disease in China. Nature 2020;579:265-+.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41586-020-2008-3&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F05%2F14%2F2020.05.12.20099754.atom) 

7.  7.Han D, Li Z, Li R, Tan P, Zhang R, Li J. mNGS in clinical microbiology laboratories: on the road to maturity. Crit Rev Microbiol 2019;45:668–85.
    
    
8.  8.Adey A, Morrison HG, Asan, et al. Rapid, low-input, low-bias construction of shotgun fragment libraries by high-density in vitro transposition. Genome Biol 2010; 11: 17.
    
    
9.  9.Hennig BP, Velten L, Racke I, et al. Large-Scale Low-Cost NGS Library Preparation Using a Robust Tn5 Purification and Tagmentation Protocol. G3-Genes Genomes Genet 2018;8:79–89.
    
    
10. 10.Picelli S, Bjorklund AK, Reinius B, Sagasser S, Winberg G, Sandberg R. Tn5 transposase and tagmentation procedures for massively scaled sequencing projects. Genome Res 2014;24:2033–40.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NjoiZ2Vub21lIjtzOjU6InJlc2lkIjtzOjEwOiIyNC8xMi8yMDMzIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjAvMDUvMTQvMjAyMC4wNS4xMi4yMDA5OTc1NC5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 

11. 11.Miller S, Naccache SN, Samayoa E, et al. Laboratory validation of a clinical metagenomic sequencing assay for pathogen detection in cerebrospinal fluid. Genome Res 2019;29:831–42.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NjoiZ2Vub21lIjtzOjU6InJlc2lkIjtzOjg6IjI5LzUvODMxIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjAvMDUvMTQvMjAyMC4wNS4xMi4yMDA5OTc1NC5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 

12. 12.Di L, Fu YS, Sun Y, et al. RNA sequencing by direct tagmentation of RNA/DNA hybrids. Proc Natl Acad Sci U S A 2020;117:2886–93.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoicG5hcyI7czo1OiJyZXNpZCI7czoxMDoiMTE3LzYvMjg4NiI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIwLzA1LzE0LzIwMjAuMDUuMTIuMjAwOTk3NTQuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 

13. 13.Chen C, Li J, Di L, et al. MINERVA: A facile strategy for SARS-CoV-2 whole genome deep sequencing of clinical samples. bioRxiv 2020:2020.04.25.060947.
    
    
14. 14.Lu B, Dong L, Yi D, et al. Transposase assisted tagmentation of RNA/DNA hybrid duplexes. bioRxiv 2020:2020.01.29.926105.
    
    
15. 15.Carpenter ML, Tan SK, Watson T, et al. Metagenomic Next-Generation Sequencing for Identification and Quantitation of Transplant-Related DNA Viruses. J Clin Microbiol 2019;57:12.
    
    
16. 16.Lewandowski K, Xu YF, Pullan ST, et al. Metagenomic Nanopore Sequencing of Influenza Virus Direct from Clinical Respiratory Samples. J Clin Microbiol 2020;58:15.
    
    
17. 17.Chen S, Zhou Y, Chen Y, Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics (Oxford, England) 2018;34:i884-i90.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/bty560&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=30423086&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F05%2F14%2F2020.05.12.20099754.atom) 

18. 18.Kopylova E, Noe L, Touzet H. SortMeRNA: fast and accurate filtering of ribosomal RNAs in metatranscriptomic data. Bioinformatics (Oxford, England) 2012;28:3211–7.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/bts611&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23071270&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F05%2F14%2F2020.05.12.20099754.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000312105300008&link_type=ISI) 

19. 19.Wood DE, Lu J, Langmead B. Improved metagenomic analysis with Kraken 2. Genome Biol 2019;20:257.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/s13059-019-1891-0&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=31779668&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F05%2F14%2F2020.05.12.20099754.atom) 

20. 20.Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. Journal of molecular biology 1990;215:403–10.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1006/jmbi.1990.9999&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=2231712&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F05%2F14%2F2020.05.12.20099754.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1990ED16700008&link_type=ISI) 

21. 21.Sohn MB, An L, Pookhao N, Li Q. Accurate genome relative abundance estimation for closely related species in a metagenomic sample. BMC bioinformatics 2014;15:242.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/1471-2105-15-242&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25027647&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F05%2F14%2F2020.05.12.20099754.atom) 

22. 22.Benson DA, Cavanaugh M, Clark K, et al. GenBank. Nucleic acids research 2018;46:D41-d7.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/nar/gkx1094&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=29140468&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F05%2F14%2F2020.05.12.20099754.atom) 

23. 23.Armbruster D, Pry T. Limit of Blank, Limit of Detection and Limit of Quantitation. The Clinical biochemist Reviews / Australian Association of Clinical Biochemists 2008;29 Suppl 1:S49–52.
    
    
24. 24.Vogels CBF, Brito AF, Wyllie AL, et al. Analytical sensitivity and efficiency comparisons of SARS-COV-2 qRT-PCR primer-probe sets. medRxiv 2020:2020.03.30.20048108.
    
    
25. 25.Kothalawala M, Jayaweera J, Arunan S, Jayathilake A. The emergence of non-albicans candidemia and evaluation of HiChrome Candida differential agar and VITEK2 YST(R) platform for differentiation of Candida bloodstream isolates in teaching hospital Kandy, Sri Lanka. BMC microbiology 2019; 19: 136.
    
    
26. 26.Radovanovic D, Sotgiu G, Jankovic M, et al. An international perspective on hospitalized patients with viral community-acquired pneumonia. European journal of internal medicine 2019;60:54–70.
    
    
27. 27.Zhou F, Wang Y, Liu Y, et al. Disease severity and clinical outcomes of community-acquired pneumonia caused by non-influenza respiratory viruses in adults: a multicentre prospective registry study from the CAP-China Network. The European respiratory journal 2019;54.
    
    
28. 28.Chiu CY, Miller SA. Clinical metagenomics. Nature reviews Genetics 2019;20:341–55.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41576-019-0113-7&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=30918369&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F05%2F14%2F2020.05.12.20099754.atom) 

29. 29.Prachayangprecha S, Schapendonk CM, Koopmans MP, et al. Exploring the potential of next-generation sequencing in detection of respiratory viruses. J Clin Microbiol 2014;52:3722–30.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiamNtIjtzOjU6InJlc2lkIjtzOjEwOiI1Mi8xMC8zNzIyIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjAvMDUvMTQvMjAyMC4wNS4xMi4yMDA5OTc1NC5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 

30. 30.van Boheemen S, van Rijn AL, Pappas N, et al. Retrospective Validation of a Metagenomic Sequencing Protocol for Combined Detection of RNA and DNA Viruses Using Respiratory Samples from Pediatric Patients. J Mol Diagn 2020;22:196–207.
    
    
31. 31.Carpenter ML, Tan SK, Watson T, et al. Metagenomic Next-Generation Sequencing for Identification and Quantitation of Transplant-Related DNA Viruses. J Clin Microbiol 2019;57.
    
    
32. 32.Ngaosuwankul N, Noisumdaeng P, Komolsiri P, et al. Influenza A viral loads in respiratory samples collected from patients infected with pandemic H1N1, seasonal H1N1 and H3N2 viruses. Virology journal 2010;7:75.
    
    
33. 33.de Goffau MC, Lager S, Sovio U, et al. Human placenta has no microbiome but can contain potential pathogens. Nature 2019;572:329–34.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41586-019-1628-y&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=31367035&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F05%2F14%2F2020.05.12.20099754.atom) 

34. 34.Ren L, Zhang R, Rao J, et al. Transcriptionally Active Lung Microbiome and Its Association with Bacterial Biomass and Host Inflammatory Status. mSystems 2018;3.
    
    
35. 35.Clausen ML, Agner T, Lilje B, Edslev SM, Johannesen TB, Andersen PS. Association of Disease Severity With Skin Microbiome and Filaggrin Gene Mutations in Adult Atopic Dermatitis. JAMA dermatology 2018;154:293–300.
    
    
36. 36.Bassiouni A, Paramasivan S, Shiffer A, et al. Microbiotyping the Sinonasal Microbiome. Frontiers in cellular and infection microbiology 2020;10:137.