Rapid, point‐of‐care antigen and molecular‐based tests for diagnosis of SARS‐CoV‐2 infection

Summary of findings 1. Diagnostic accuracy of point‐of‐care antigen and molecular‐based tests for the diagnosis of SARS‐CoV‐2 infection

Question	What is the diagnostic accuracy of rapid point‐of‐care antigen and molecular‐based tests for the diagnosis of SARS‐CoV‐2 infection?
Population	Adults or children suspected of: current SARS‐CoV‐2 infection or populations undergoing screening for SARS‐CoV‐2 infection, including asymptomatic contacts of confirmed COVID‐19 cases community screening
Index test	Any rapid antigen or molecular‐based test for diagnosis of SARS‐CoV‐2 meeting the following criteria: portable or mains‐powered device minimal sample preparation requirements minimal biosafety requirements no requirement for a temperature‐controlled environment test results available within 2 hours of sample collection
Target condition	Detection of current SARS‐CoV‐2 infection
Reference standard	For COVID‐19 cases: positive RT‐PCR alone or clinical diagnosis of COVID‐19 based on established guidelines or combinations of clinical features For non‐COVID‐19 cases: repeated negative RT‐PCR or pre‐pandemic sources of samples
Action	False negative results mean missed cases of COVID‐19 infection, with either delayed or no confirmed diagnosis and increased risk of community transmission due to false sense of security False positive results lead to unnecessary self‐isolation or quarantine, with the potential for new infection to be acquired
Quantity of evidence	Number of studies		Total samples		Total samples with confirmed SARS‐CoV‐2
Quantity of evidence	18		3198		1775
Limitations in the evidence
Risk of bias	Participants: high or unclear risk in 16 studies (89%) Index test: high or unclear risk in 14 studies (78%) Reference standard: unclear risk in 10 studies (56%) Flow and timing: high or unclear risk in 15 studies (83%)
Concerns about applicability	Participants: high concerns in 13 studies (72%) Index test: high concerns in 13 studies (72%) Reference standard: high concerns in 17 studies (94%)
Findings
Antigen tests
Evaluations (studies)	Samples		Confirmed SARS‐CoV‐2 samples		Average sensitivity (95% CI) [Range]	Average specificity (95% CI) [Range]
8 (5)	943		596		56.2 (29.5 to 79.8) [0% to 94%]^a	99.5 (98.1 to 99.9) [90% to 100%]
Average sensitivity and specificity applied to a hypothetical cohort of 1000 patients^a
Prevalence of COVID‐19	TP	FP	FN	TN	PPV^b	NPV^c
5%	28^a	5	22^a	945	85% (68% to 95%)^a	98% (97% to 99%)
10%	56^a	5	44^a	896	92% (82% to 97%)^a	95% (94% to 97%)^a
20%	112^a	4	88^a	796	97% (91% to 99%)^a	90% (88% to 92%)^a
Rapid molecular tests
Evaluations (studies)	Samples		Confirmed SARS‐CoV‐2 samples		Average sensitivity (95% CI) [Range]	Average specificity (95% CI) [Range]
13 (11)	2255		1179		95.2 (86.7 to 98.3) [68% to 100%]	98.9 (97.3 to 99.5) [92% to 100%]
Average sensitivity and specificity applied to a hypothetical cohort of 1000 patients
Prevalence of COVID‐19	TP	FP	FN	TN	PPV^b(95% CI)	NPV^c(95% CI)
5%	48	10	2	940	83% (71% to 91%)	100% (99% to 100%)
10%	95	10	5	890	90% (83% to 95%)	99% (99% to 100%)
20%	190	9	10	791	95% (92% to 98%)	99% (98% to 99%)
Pooled results for individual tests
Tests	Evaluations		Samples	SARS‐CoV‐2 cases	Sensitivity (95% CI)	Specificity (95% CI)
Shenzhen Bioeasy Ag assay	2		238	162	89.5 (83.7 to 93.8)	100 (95.3 to 100)
ID NOW	5		1003	496	76.8 (72.9 to 80.3)	99.6 (98.4 to 99.9)
Xpert Xpress	6		919	479	99.4 (98.0 to 99.8)	96.8 (90.6 to 99.0)
Average sensitivity and specificity applied to a hypothetical cohort of 1000 patients where 100 have COVID‐19 infection (10% prevalence)
Tests	TP	FP	FN	TN	PPV^b(95% CI)	NPV^c(95% CI)
Shenzhen Bioeasy Ag assay	90	0	11	900	100% (96% to 100%)	99% (98% to 99%)
ID NOW	77	4	23	896	96% (89% to 99%)	97% (96% to 98%)
Xpert Xpress	99	29	1	871	77% (69% to 84%)	100% (99% to 100%)
Ag: antigen;CI: confidence interval; FN: false negative; FP: false positive;NPV: negative predictive value; PPV: positive predictive value; RT‐PCR: reverse transcription polymerase chain reaction; TN: true negative; TP: true positive
^aAs there is high heterogeneity in the estimates of sensitivity, the values observed in practice could vary considerably from these figures. ^bPPV (positive predictive value) defined as the percentage of positive rapid test results that are truly positive according to the reference standard diagnosis. ^cNPV (negative predictive value) defined as the percentage of negative rapid test results that are truly negative according to the reference standard diagnosis.

Background

Severe acute respiratory syndrome coronavirus 2 (SARS‐CoV‐2) and the resulting COVID‐19 pandemic present important diagnostic evaluation challenges. These range from: understanding the value of signs and symptoms in predicting possible infection; assessing whether existing biochemical and imaging tests can identify infection or people needing critical care; and evaluating whether new biomarker tests can accurately identify current infection, rule out infection, identify people in need of care escalation, or test for past infection and immunity.

We are creating and maintaining a suite of living systematic reviews to cover the roles of tests and patient characteristics in the diagnosis of COVID‐19. This review summarises evidence for the accuracy of rapid antigen and molecular tests, suitable for use at the point of care, as alternatives to standard laboratory‐based reverse transcription polymerase chain reaction (RT‐PCR), that are relied on for identifying current infection. If sufficiently accurate, point‐of‐care tests may have a greater impact on public health than RT‐PCR as they do not require the same technical expertise and laboratory capacity. These tests can be undertaken locally, avoiding the need for centralised testing facilities that rarely meet the needs of patients, caregivers, health workers and society as a whole, especially in low‐ and middle‐income countries. As these are rapid tests, their results can be returned within the same clinical encounter, facilitating timely decisions concerning the need for isolation.

Target condition being diagnosed

COVID‐19 is the disease caused by infection with the SARS‐CoV‐2 virus. The key target conditions for this suite of reviews are current SARS‐CoV‐2 infection, current COVID‐19 disease, and past SARS‐CoV‐2 infection. The tests included in this review concern the identification of current infection.

For current infection, the severity of the disease is of importance. SARS‐CoV‐2 infection can be asymptomatic (no symptoms); mild or moderate (symptoms such as fever, cough, aches, lethargy but without difficulty breathing at rest); severe (symptoms with breathlessness and increased respiratory rate indicative of pneumonia); or critical (requiring respiratory support due to severe acute respiratory syndrome (SARS) or acute respiratory distress syndrome (ARDS). People with COVID‐19 pneumonia (severe or critical disease) require different patient management, and it is important to be able to identify them. Viral load may also be an indicator of disease severity (Zheng 2020), and whilst the accuracy of antigen and molecular tests have the potential to be affected by participant viral load, the main aim of rapid testing is not to establish viral load. In this review, we therefore consider the role of point‐of‐care tests for detecting SARS‐CoV‐2 infection of any severity.

Index test(s)

The primary consideration for the eligibility of tests for inclusion in this review is that they should detect current infection and should have the capacity to be performed at the ‘point of care’ or in a ‘near‐patient’ testing role. There is an ongoing debate around the specific use and definitions of these terms, therefore for the purposes of this review, we consider ‘point‐of‐care’ and ‘near patient’ to be synonymous, but for consistency and avoidance of confusion, we use the term ‘point‐of‐care’ throughout.

We have adapted a definition of point‐of‐care testing, namely that it “refers to decentralized testing that is performed by a minimally trained healthcare professional near a patient and outside of central laboratory testing” (WHO 2018), with the additional caveat that test results must be available within a single clinical encounter (Pai 2012). The key criteria for test inclusion are therefore:

the equipment for running and or reading the assay must be portable or easily transported, although mains power may be required;
minimal sample preparation requirements, for example, single‐step mixing, with no requirement for additional equipment or precise sample volume transfer unless a disposable automatic fill or graduated transfer device is used;
minimal biosafety requirements, for example, personal protective equipment (PPE) for sample collector and test operator, good ventilation and a biohazard bag for waste disposal;
no requirement for a temperature‐controlled environment; and
test results available within two hours of sample collection.

Tests for detection of current infection that are currently suitable for use at the point of care include antigen tests and molecular‐based tests. Both types of test use the same respiratory‐tract samples acquired by swabbing, washing or aspiration as for laboratory‐based RT‐PCR. Rapid antigen tests use lateral flow immunoassays, which are disposable devices, usually in the form of plastic cassettes akin to a pregnancy test. Viral antigen is captured by dedicated antibodies that are either colloidal gold‐ or fluorescent‐labelled. Antigen detection is indicated by visible lines appearing on the test strip (colloidal gold‐based immunoassays, or CGIA), or through fluorescence, which can be detected using an immunofluorescence analyser (fluorescence immunoassays or FIA). Molecular‐based tests to detect viral ribonucleic acid (RNA) have historically been laboratory‐based assays using RT‐PCR technology (see Alternative test(s)). In recent years, automated, single‐step RT‐PCR methods have been developed, as well as other nucleic acid amplification methods, such as isothermal amplification, that do not require the sophisticated thermo cycling involved in RT‐PCR (Carter 2020). These technological advances have allowed molecular technologies to be developed that are suitable for use in a point‐of‐care context (Kozel 2017).

Following the emergence of COVID‐19 there has been prolific industry activity to develop accurate tests. The Foundation for Innovative Diagnostics (FIND) and Johns Hopkins Centre for Health Security have maintained online lists of these and other molecular‐based tests for SARS‐CoV‐2 (FIND 2020). At the time of writing (19 July 2020), FIND listed 48 rapid antigen tests, 32 of which are described as "commercialized" and 21 have been identified as having regulatory approval. A total of 113 molecular tests were described as automated, including both laboratory‐based assays and assays suitable for use outside of a laboratory setting (i.e. near or at the point of care). Further information from FIND indicates that 47 of the 113 assays were categorised as point‐of‐care or near point‐of‐care tests, including 26 with regulatory approval. This classification was based on the information provided to FIND by the test manufacturers and does not necessarily mean that these tests meet the criteria for point‐of‐care tests that we have specified for this review. The numbers of tests of these types will increase over time.

Clinical pathway

Patients may be tested for infection when they present with symptoms, or have had known exposure to COVID‐19, or during screening for COVID‐19. The standard approach to diagnosis of COVID‐19 infection is through laboratory‐based testing of swab samples taken from the upper respiratory (e.g. nasopharynx, oropharynx) or lower respiratory tract (e.g. bronchoalveolar lavage or sputum) with RT‐PCR. RT‐PCR is the primary method for detecting infection during the acute phase of the illness while the virus is still present (whether people are symptomatic or asymptomatic), but can give false negative results (Arevalo‐Rodriguez 2020). Both the World Health Organiation (WHO) and the China CDC (National Health Commission of the People's Republic of China), have produced case definitions for COVID‐19 that include the presence of convincing clinical evidence when RT‐PCR is negative (Appendix 1). The most recent case definition from the China CDC also includes positive serology tests.

Prior test(s)

Signs and symptoms are used in the initial diagnosis of suspected COVID‐19 infection and to help identify those who require a test for RT‐PCR. A number of key symptoms have been associated with mild to moderate COVID‐19, including: troublesome dry cough (for example, coughing more than usual over a one‐hour period, or three or more coughing episodes in 24 hours), fever greater than 37.8 °C, diarrhoea, headache, breathlessness on light exertion, muscle pain, fatigue, and loss of sense of smell and taste. However, the recently published review of signs and symptoms found good evidence for the accuracy for these symptoms alone or in combination to be lacking (Struyf 2020).

Where people are asymptomatic but are being tested on the basis of epidemiological risk factors, such as exposure to someone with confirmed SARS‐CoV‐2, no prior tests will have been conducted.

Role of index test(s)

For most settings in which testing for acute SARS‐CoV‐2 infection takes place, results of laboratory‐based RT‐PCR tests are unlikely to be available within a single clinical encounter. Point‐of‐care tests potentially have a role either as a replacement for RT‐PCR (if sufficiently accurate), or as a means of triaging and rapid management (quarantine or treatment, or both), with confirmatory RT‐PCR testing for negative results. Obtaining quick results within a healthcare visit will allow more appropriate decisions about isolation and healthcare interventions. If accurate, tests may also be considered for screening at‐risk populations, for example in airport settings or in local outbreaks.

Alternative test(s)

This review is one of seven planned reviews that cover the range of tests and characteristics being considered in the management of COVID‐19 (Deeks 2020; McInnes 2020). Full details of the alternative tests and evidence of their accuracy will be summarised in these reviews. Tests that might be considered as alternatives to point‐of‐care tests are considered here.

Laboratory‐based molecular tests

RT‐PCR tests for SARS‐CoV‐2 identify viral ribonucleic acid (RNA). Reagents for RT‐PCR were rapidly produced once the viral RNA sequence was published (Corman 2020). Testing is undertaken in central laboratories and can be very labour‐intensive, with several points along the path of performing a single test where errors may occur, although some automation of parts of the process is possible. The amplification process requires thermal cycling equipment to allow multiple temperature changes within a cycle, with cycles repeated up to 40 times until viral DNA is detected (Carter 2020). Although the amplification process for RT‐PCR can be completed in a relatively short timeframe, the stages of extraction, sample processing and data management (including reporting) mean that test results are typically only available in 24 to 48 hours. Where testing is undertaken in a centralised laboratory, transport times increase this further. The time to result for fully automated RT‐PCR assays is shorter than for manual RT‐PCR, however most assays still require sample preparation steps that make them unsuitable for use at the point of care. Other nucleic acid amplification methods, including loop‐mediated isothermal amplification (LAMP), or CRISPR‐based nucleic acid detection methods, that allow amplification at a constant temperature are also being developed (Carter 2020). These methods have the potential to reduce the time to produce test results after extraction and sample processing to minutes, but the time for the whole process may still be significant. Laboratory‐based molecular tests are most often applied to upper and lower respiratory samples although they are also being used on faecal and urine samples.

Antibody tests

Serology tests to measure antibodies to SARS‐CoV‐2 have been evaluated in people with active infection and in convalescent cases (Deeks 2020a). Antibodies are formed by the body's immune system in response to infections, and can be detected in whole blood, plasma or serum. Antibody tests are available for laboratory use including enzyme‐linked immunosorbent assay (ELISA) methods, or more advanced chemiluminescence immunoassays (CLIA). There are also rapid lateral flow assays (LFA)s for antibody testing that use a minimal amount of whole blood, plasma or serum on a testing strip as opposed to the respiratory specimens that are used for rapid antigen tests; all assays for antibody detection are considered in Deeks 2020a.

Rationale

It is essential to understand the clinical accuracy of tests and diagnostic features to identify the best way they can be used in different settings to develop effective diagnostic and management pathways. The suite of Cochrane 'living systematic reviews' summarises evidence on the clinical accuracy of different tests and diagnostic features, grouped according to the research questions and settings that we are aware of. Estimates of accuracy from these reviews will help inform diagnosis, screening, isolation, and patient‐management decisions.

As the COVID‐19 pandemic progresses, earlier, fast and reliable detection of active SARS‐CoV‐2 infection is key to reducing community transmission. New biomarker tests are being developed and evidence is accumulating at an unprecedented rate. Point‐of‐care testing provides a potentially attractive route to increasing testing rates; however their potential to have an impact on patient care and help reduce transmission depends not only on the time it takes to report the test result, but on test performance and frequency of testing. We are aware of two other reviews on this topic (Green 2020; Subsoontorn 2020). One rapid review of point‐of‐care tests relied on performance data from manufacturers’ instructions for use documents (Green 2020). A systematic review of nucleic acid amplification ‘point‐of‐care tests’ selected studies for inclusion based on the use of isothermal techniques (i.e. not requiring thermal cycling), with apparently no consideration for the feasibility of deploying the tests in a point‐of‐care environment (Subsoontorn 2020). A comprehensive systematic review of the clinical performance of tests suitable for use at the point of care is therefore urgently needed. We will update this review as often as is feasible to ensure that it provides current evidence about the accuracy of point‐of‐care tests.

Please note, this review follows a generic protocol that covers six of the seven Cochrane COVID‐19 DTA reviews (Deeks 2020). The Background and Methods sections of this review therefore use some text that was originally published in the protocol (Deeks 2020), and text that overlaps some of our other reviews (Deeks 2020a; Struyf 2020).

Objectives

To assess the diagnostic accuracy of rapid point‐of‐care antigen and molecular‐based tests to determine if a person presenting in the community or in primary or secondary care has current SARS‐CoV‐2 infection.

Secondary objectives

Where data are available, we will investigate potential sources of heterogeneity that may influence diagnostic accuracy (either by stratified analysis or meta‐regression) according to index test, participant characteristics (length and severity of symptoms, and viral load), study setting, study design and reference standard used.

Methods

Criteria for considering studies for this review

Types of studies

We applied broad eligibility criteria in order to include all patient groups (that is, if patient population was unclear, we included the study) and all variations of a test.

We included studies of all designs that produce estimates of test accuracy or provide data from which we can compute estimates, including the following.

Studies restricted to participants confirmed to either have (or to have had) the target condition (to estimate sensitivity) or confirmed not to have (or have had) the target condition (to estimate specificity). These types of studies may be excluded in later review updates.
Single‐group studies, which recruit participants before disease status has been ascertained.
Multi‐group studies, where people with and without the target condition are recruited separately (often referred to as two‐gate or diagnostic case‐control studies).
Studies based on either patients or samples.

We excluded studies from which we could not extract data to compute either sensitivity or specificity.

We carefully considered the limitations of different study designs in the quality assessment and analyses.

We included studies reported in published articles and as preprints.

Participants

We included studies recruiting people presenting with suspicion of current SARS‐CoV‐2 infection or those recruiting populations where tests were used to screen for disease (for example, contact tracing or community screening).

We also included studies that recruited people known to have SARS‐CoV‐2 infection and known not to have SARS‐CoV‐2 infection (i.e. cases only or multi‐group studies).

We excluded small studies with fewer than 10 samples or participants. Although the size threshold of 10 is arbitrary, such small studies are likely to give unreliable estimates of sensitivity or specificity and may be biased.

Index tests

We included studies evaluating any rapid antigen or molecular‐based test for diagnosis of SARS‐CoV‐2, if it met the criteria outlined in the Background, that is, requiring minimal equipment, sample preparation, and biosafety considerations, with results available within two hours of sample collection.

Target conditions

The target condition was current SARS‐CoV‐2 infection (either symptomatic or asymptomatic). We also refer to SARS‐CoV‐2 infection as ‘COVID‐19 infection’.

Reference standards

We anticipated that studies would use a range of reference standards to define both the presence and absence of SARS‐CoV‐2 infection but were unclear at the start of the review exactly what methods we would encounter. For the QUADAS‐2 (Quality Assessment tool for Diagnostic Accuracy Studies; Whiting 2011), assessment we categorised each method of defining the presence of SARS‐CoV‐2 according to the risk of bias (the chances that it would misclassify the presence or absence of infection) and whether it defined COVID‐19 in an appropriate way that reflected cases encountered in practice. Likewise, we considered the risk of bias in definitions of the absence of SARS‐CoV‐2, and whether the definition included all those who would be tested in practice.

Evaluations of molecular tests generally consider agreement between molecular assays, for example, agreement of a new rapid test against a more standard RT‐PCR test. For the purposes of this review, we considered RT‐PCR to be the ‘reference standard’ against which the rapid tests were compared, and present results as ‘sensitivity’ and ’specificity’ as opposed to percentage agreement. The result of further RT‐PCR analysis of discrepant cells (samples with results disagreeing on the rapid test and the RT‐PCR) were also considered in sensitivity analyses. As discrepant analysis involves retesting only a subsample of patients selected according to index and reference standard results, it can introduce bias (Hadgu 1999). Retesting of all samples with a second test in a composite reference standard would be preferable when there are concerns over the accuracy of the first reference test.

Search methods for identification of studies

Electronic searches

We conducted a single literature search to cover our suite of Cochrane COVID‐19 diagnostic test accuracy (DTA) reviews (Deeks 2020; McInnes 2020).

We conducted electronic searches using two primary sources. Both of these searches aimed to identify all published articles and preprints related to COVID‐19, and were not restricted to those evaluating biomarkers or tests. Thus, there are no test terms, diagnosis terms, or methodological terms in the searches. Searches were limited to 2019 and 2020, and for this version of the review have been conducted to 25 May 2020.

Cochrane COVID‐19 Study Register searches

We used the Cochrane COVID‐19 Study Register (covid-19.cochrane.org/), for searches conducted from inception of the Register to 28 March 2020. At that time, the register was populated by searches of PubMed, as well as trials registers at ClinicalTrials.gov and the WHO International Clinical Trials Registry Platform (ICTRP).

Search strategies were designed for maximum sensitivity, to retrieve all human studies on COVID‐19 and with no language limits. See Appendix 2.

COVID‐19 Living Evidence Database from the University of Bern

From 28 March 2020, we used the COVID‐19 Living Evidence database from the Institute of Social and Preventive Medicine (ISPM) at the University of Bern (www.ispm.unibe.ch), as the primary source of records for the Cochrane COVID‐19 DTA reviews. This search includes PubMed, Embase, and preprints indexed in bioRxiv and medRxiv databases. The strategies as described on the ISPM website are described here (ispmbern.github.io/covid-19/). See Appendix 3. To ensure comprehensive coverage we also downloaded records from the ‘Bern feed’ from 1 January to 28 March 2020 and de‐duplicated them against those obtained via the Cochrane COVID‐19 Study Register.

The decision to focus primarily on the Bern feed was because of the exceptionally large numbers of COVID‐19 studies available only as preprints. The Cochrane COVID‐19 Study Register has undergone a number of iterations since the end of March and we anticipate moving back to the Register as the primary source of records for subsequent review updates.

Searching other resources

We identified Embase records through the Centers for Disease Control and Prevention (CDC), Stephen B Thacker CDC Library, COVID‐19 Research Articles Downloadable Database (www.cdc.gov/library/researchguides/2019novelcoronavirus/researcharticles.html), and de‐duplicated them against the Cochrane COVID‐19 Study Register up to 28 March 2020. See Appendix 4.

We also checked our search results against two additional repositories of COVID‐19 publications including:

the Evidence for Policy and Practice Information and Co‐ordinating Centre (EPPI‐Centre) 'COVID‐19: Living map of the evidence' (eppi.ioe.ac.uk/COVID19_MAP/covid_map_v4.html);
the Norwegian Institute of Public Health 'NIPH systematic and living map on COVID‐19 evidence' (www.nornesk.no/forskningskart/NIPH_diagnosisMap.html)

Both of these repositories allow their contents to be filtered according to studies potentially relating to diagnosis, and both have agreed to provide us with updates of new diagnosis studies added. For this iteration of the review, we examined all diagnosis studies from either source up to 25 May 2020.

We appeal to researchers to supply details of additional published or unpublished studies at the following email address, which we will consider for inclusion in future updates ([email protected]).

Data collection and analysis

Selection of studies

A team of experienced systematic review authors from the University of Birmingham screened the titles and abstracts of all records retrieved from the literature searches. Two review authors independently screened studies in Covidence. A third, senior review author resolved any disagreements. We tagged all records selected as potentially eligible according to the Cochrane COVID‐19 DTA review(s) that they might be eligible for and we then exported them to separate Covidence reviews for each review title.

We obtained the full texts for all studies flagged as potentially eligible. Two review authors independently screened the full texts for one of the COVID‐19 biomarker reviews (molecular, antigen or antibody tests). We resolved any disagreements on study inclusion through discussion with a third review author.

Data extraction and management

One review author extracted the characteristics of each study, which a second review author checked. Items that we extracted are listed in Appendix 5. In addition, we coded tests according to complexity, regardless of the nature of the test (antigen or molecular test), as follows:

low: one sample preparation step and up to two test steps;
moderate: two sample preparation steps and up to three test steps;
high: more than two sample preparation steps and more than three test steps.

Two review authors independently carried out this classification, with referral to a third review author if necessary.

Both review authors independently performed data extraction of 2x2 contingency tables of the number of true positives, false positives, false negatives and true negatives. They resolved disagreements by discussion. Where possible, we separately extracted data according to viral load, and for molecular assays, before and after re‐analysis of samples in discrepant cells.

We encourage study authors to contact us regarding missing details on the included studies ([email protected]).

Assessment of methodological quality

Two review authors independently assessed risk of bias and applicability concerns using the QUADAS‐2 checklist tailored to this review (Appendix 6; Whiting 2011). The two review authors resolved any disagreements by discussion.

Ideally, studies should prospectively recruit a representative sample of participants presenting with signs and symptoms of COVID‐19, either in community or primary care settings or to a hospital setting, and they should clearly record the time of testing after the onset of symptoms. Studies in asymptomatic people at risk of infection should document time from exposure. Studies should perform tests in their intended use setting, using appropriate samples with or without viral transport medium and within the time period following specimen collection as indicated in the 'instructions for use' document. Tests should be performed by relevant personnel (e.g. healthcare workers), and should be interpreted blinded to the final diagnosis (presence or absence of SARS‐CoV‐2). The reference standard diagnosis should be blinded to the result of the rapid test, and should not incorporate the result of the index test. We did not consider a comparison of a rapid molecular‐based test against an RT‐PCR assay to be at risk of incorporation bias. If the reference standard includes clinical diagnosis of COVID‐19 for RT‐PCR‐negative patients, then established criteria should be used. Studies including samples from participants known not to have COVID‐19 should use pre‐pandemic sources or contemporaneous samples with at least one RT‐PCR‐negative test result. Data should be reported for all study participants, including those where the result of the rapid test was inconclusive, or participants in whom the final diagnosis of COVID‐19 was uncertain. Studies should report whether results relate to participants (one sample per participant), or samples (multiple samples per participant).

Statistical analysis and data synthesis

We analysed rapid antigen and molecular tests separately. If studies evaluated multiple tests in the same samples, we included them multiple times. We present estimates of sensitivity and specificity for each test brand using paired forest plots, and summarise results using average sensitivity and specificity in tables as appropriate. There were only sufficient studies to make formal comparisons (based on between‐study comparisons) for studies using two brands of molecular tests (ID NOW (Abbott Laboratories) and Xpert Xpress (Cepheid Inc)).

We estimated summary sensitivities and specificities with 95% confidence intervals (CI) using the bivariate model (Reitsma 2005), via the meqrlogit command of Stata/SE 16.0. When few studies were available, we simplified models by first assuming no correlation between sensitivity and specificity estimates and secondly by setting near‐zero variance estimates of the random effects to zero (Takwoingi 2017). In cases where there was only one study per test, we reported individual sensitivities and specificities with 95% CI constructed using the binomial exact method.

Where studies presented only estimates of sensitivity, we fitted univariate random effects logistic regression models. In a small number of instances where a model failed to converge (usually when there were very small numbers of studies or the sensitivity/specificity estimates were all very high), we computed estimates and CI by summing the counts of TP, FP, FN and TN across 2x2 tables. These analyses are clearly marked in the tables. We present all estimates with 95% confidence intervals.

Investigations of heterogeneity

We examined heterogeneity between studies by visually inspecting the forest plots of sensitivity and specificity. Where adequate data were available, we investigated heterogeneity related to viral load, test brand, and sample type by including indicator variables in the random‐effects logistic regression models. Absolute differences between the sensitivity or specificity and the P values were reported from the model. In instances where only one study was available per test or when tests were being directly compared following summing of counts of the 2x2 tables, we performed test comparison using the two‐sample test of proportions.

Sensitivity analyses

We performed three sensitivity analyses. First, estimation of sensitivity for molecular tests was made with and without studies that only evaluated samples with RT‐PCR‐confirmed SARS‐CoV‐2 (and thus did not estimate specificity). Secondly, comparisons were made between analyses using the primary reference standard and analyses using results adjusted after sample retesting with a second RT‐PCR test, either for discrepant cells (discrepant analysis) or for all samples. Thirdly, we restricted our analysis comparing ID NOW (Abbott Laboratories) and Xpert Xpress (Cepheid Inc) to studies that compared the tests in the same samples.

Assessment of reporting bias

We made no formal assessment of reporting bias.

Summary of findings

We summarised key findings in a 'Summary of findings' table indicating the strength of evidence for each test and findings, and highlighted important gaps in the evidence.

Updating

We are aware of additional studies published since the search date of 25 May 2020 and plan to update this review imminently. We have already completed searches for the update up until 22 June 2020, and screening of those is ongoing.

Results

Results of the search

We screened 19,092 unique records (published or preprints) for inclusion in the complete suite of reviews to assist in the diagnosis of COVID‐19 (Deeks 2020; McInnes 2020). Of 808 records selected for further assessment for inclusion in any of the four molecular, antigen or antibody test reviews, we assessed 90 full‐text reports for inclusion in this review. See Figure 1 for the PRISMA flow diagram of search and eligibility results (McInnes 2018; Moher 2009). We included 18 studies from 22 reports in this review, and we excluded 68 publications that did not meet our inclusion criteria. Exclusions were mainly because of index tests not meeting our criteria for use at the point of care (n = 36) or ineligible study designs (n = 21). The reasons for exclusion of all 68 publications are provided in Characteristics of excluded studies.

Figure 1

Study flow diagram

We contacted the authors of three included studies for further information (Diao 2020; Porte 2020; Weitzel 2020 [A]), and received replies and the requested information in regard to all three.

The 22 included study reports relate to 18 separate studies, four studies having both preprints and subsequent journal publications (Broder 2020; Mertens 2020; Porte 2020; Smithgall 2020 [A]). Of the 18 studies, five are available only as preprints. (Please note when naming studies, we use the letters [A], [B], [C] etc. in square brackets to indicate data on different tests evaluated in the same study).

Description of included studies

The 18 studies include a total of 3198 unique samples, with 1775 samples with RT‐PCR‐confirmed SARS‐CoV‐2 (some samples were analysed by more than one index test). Five studies evaluated antigen tests (Diao 2020; Lambert‐Niclot 2020; Mertens 2020; Porte 2020; Weitzel 2020 [A]) and 13 studies evaluated molecular tests (Assennato 2020; Broder 2020; Harrington 2020; Hogan 2020; Lieberman 2020; Loeffelholz 2020; Mitchell 2020; Moore 2020; Moran 2020; Rhoads 2020; Smithgall 2020 [A]; Wolters 2020; Zhen 2020 [A]). Summary study characteristics are presented in Table 1 with further details of study design and index test details in Appendix 7 and Appendix 8. Full details are provided in the Characteristics of included studies table.

Table 1. Description of studies

Participants		5	13
		Antigen tests	Molecular tests
Overall sample size	Median (IQR)	112 (96 to 198)
	Range	26 to 524
Overall number of SARS‐CoV‐2 positive samples	Median (IQR)	85 (50 to 119)
	Range	13 to 220
Sample size	Median (IQR)	138 (127 to 239)	103 (88 to 172)
	Range	111 to 328	26 to 524
Number of SARS‐CoV‐2 positive samples	Median (IQR)	94 (82 to 132)	58 (46 to 96)
	Range	80 to 208	13 to 220
Setting	Hospital A & E	2 (40%)	1 (8%)
	Mixed	0 (0%)	3 (31%)
	Unclear	3 (60%)	9 (69%)
Patient group	Acute (A&E presentation)	2 (40%)	1 (8%)
	Unclear	3 (60%)	12 (92%)
Study design
Recruitment structure	Single group ‐ sensitivity and specificity	3 (60%)	6 (46%)
	Single group ‐ sensitivity only	0 (0)	2 (15%)
	Two or more groups ‐ sensitivity and specificity	2 (40%)	5 (38%)
Reference standard for presence of SARS‐CoV‐2	All RT‐PCR positive	5 (100%)	13 (100%)
Reference standard for absence of SARS‐CoV‐2	COVID suspects (double RT‐PCR negative)	1 (20%)	0 (0%)
	COVID suspects (single RT‐PCR negative)	4 (80%)	11 (85%)
	Not applicable	0 (0)	2 (15%)
Tests
Number of tests per study	1	4 (80%)	11 (84.6%)
	2	0 (0)	2 (15.4%)
	4	1 (20%)	0 (0)
Test technology, antigen tests only^a	Colloidal‐gold immunoassay	4 (50%)	N/A
	Fluorescent immunoassay	4 (50%)	N/A
Sample type	Nasal only	0 (0%)	1 (8%)
	Nasopharyngeal only	3 (60%)	6 (46%)
	Nasopharyngeal + oropharyngeal combined	2 (40%)	1 (8%)
	Nasopharyngeal or nasal	0 (0)	3 (23%)
	Nasopharyngeal or oropharyngeal	0 (0)	1 (8%)
	Mixed (3 or more types)	0 (0)	1 (8%)
IQR: interquartile range; RT‐PCR: reverse transcriptase polymerase chain reaction

^aAs a % of antigen test evaluations (n = 8).

The median sample size of the included studies is 112 (interquartile range (IQR) 96 to 198) and median number of SARS‐CoV‐2 confirmed samples included is 85 (IQR 50 to 119). The majority of studies (10/18) were conducted in the USA, four in Europe, two in South America, one in China and one study included samples from more than one country.

Participant characteristics

Studies predominantly selected samples from those submitted to laboratories for routine RT‐PCR testing with limited detail of the participants providing the samples. Three studies included samples from participants in emergency department or urgent care settings, three included samples from participants presenting in mixed settings (inpatient, outpatient or emergency department), and 12 did not report any details of setting in which study participants presented.

Four studies included samples from symptomatic patients, only one of which provided any information on the type of symptoms experienced and time from symptom onset (median 2 days; IQR 1 to 4; range 0 to 12; Porte 2020). Three additional studies provided basic demographic data such as age or gender, and the remaining 14 provided no information on participant characteristics.

All five studies evaluating antigen tests reported results for SARS‐CoV‐2‐confirmed samples with high and low viral load as defined by the cycle threshold (Ct) value from the reference standard. In one study (Diao 2020), the proportion with high viral load was 27% (cut‐off ≤ 30 Ct), and in the other four (using a cut‐off of ≤ 25 Ct) it ranged from 48% to 74% (Appendix 7). Four studies reporting five molecular assay evaluations, reported proportions with high viral load ranging from 33% (Mitchell 2020), to 60% (Smithgall 2020 [A]). All four studies defined high viral load as Ct of 30 or less. Ct values were missing for some samples in Porte 2020.

Study designs

We found it difficult to fully ascertain whether samples were included in studies with or without knowledge of whether patients did or did not have COVID‐19 infection. All studies defined the presence or absence of COVID‐19 infection based on RT‐PCR, with a single (n = 17) or two (n = 1) negative RT‐PCR results used to confirm the absence of infection. One study used paired nasopharyngeal swabs for RT‐PCR and nasal swabs for the index test (Harrington 2020); all other studies used the same respiratory sample for the RT‐PCR and for the index test.

Nine studies appeared to include series of samples submitted for laboratory testing regardless of the RT‐PCR result, but only Harrington 2020 reported including consecutive samples, and only Mertens 2020 randomly selected samples. The number of samples in these single‐group studies ranged from 26 to 524 with between 13 and 208 samples with confirmed SARS‐CoV‐2 (median prevalence 50%; IQR 41% to 68%).

Seven studies described deliberate separate sampling of RT‐PCR‐positive and RT‐PCR‐negative samples, for example, to ‘enrich’ for positive samples, to reach a stated ratio of positive to negative samples, or to represent a range of Ct values on RT‐PCR. We designated these studies as two‐group studies. Sample sizes of these studies ranged from 88 to 481 with between 57 and 220 samples with confirmed SARS‐CoV‐2 (median prevalence 60%; IQR 46% to 66%).

Two studies included only samples with confirmed SARS‐CoV‐2, thus only allowing estimation of sensitivity; 35 samples in Broder 2020, and 96 in Rhoads 2020.

Index tests

Fifteen studies evaluated only one test, three compared two or more tests using the same samples (two with two tests each, and one with four tests). In total the 18 studies reported on a total of 23 test evaluations. Appendix 9 provides details extracted from the manufacturer’s instructions for use documents for all included tests.

Antigen tests

Five studies reported eight evaluations of antigen tests (4 CGIA and 4 FIA), seven of which evaluated one of five commercially produced tests (produced by Beijing Savant, Shenzhen Bioeasy, Coris BioConcept, Liming Bio‐Products and RapiGEN Inc.) and one classified as using an in‐house CGIA method (full identification details for all tests is provided in Appendix 8). Contact with the study author indicates that this study reports the development of the Shenzhen Bioeasy assay (Diao 2020), but it is not clear whether the commercially available assay is identical to the one reported in the study or whether it has undergone further refinement. Only two studies provided product codes for the tests evaluated (Porte 2020; Weitzel 2020 [A];Appendix 8). The Beijing Savant, Coris BioConcept, Shenzhen Bioeasy and in‐house assays all target the nucleocapsid protein; this information was not reported for the Liming Bio‐Products and RapiGEN Inc.assays (Appendix 8). We have not been able to identify any information for either the Beijing Savant or Liming Bio‐Products assays online.

Two of the five studies used only nasopharyngeal swab samples, two used both nasopharyngeal and oropharyngeal swab samples from all patients (Porte 2020; Weitzel 2020 [A]), and one study (Mertens 2020), used mixed swab samples including nasopharyngeal swabs, nasopharyngeal aspirate and bronchoalveolar lavage. All studies used samples either in viral transport medium (n = 4) or in saline solution (n = 1; Diao 2020). The Coris BioConcept assay, evaluated in two studies (Lambert‐Niclot 2020; Mertens 2020), is the only one to document instructions for use for swabs in viral transport medium (VTM); the use of VTM is not mentioned in the instructions for use documents for any of the other assays (Appendix 9). Samples were tested "soon" after collection in Lambert‐Niclot 2020, after a defined period of refrigerated storage in Porte 2020 or frozen storage in Weitzel 2020 [A]; two studies did not report sample storage and timing of testing.

Molecular tests

Thirteen studies reported 15 evaluations of four different commercially available rapid molecular tests: six evaluating ID NOW (Abbott Laboratories), seven evaluating Xpert Xpress (Cepheid Inc), and one evaluation each of Accula (Mesa Biotech Inc.) and SAMBA II (Diagnostics for the Real World). None of the studies reported product codes for the tests evaluated (Appendix 8). One study of Xpert Xpress used the 'research use only' (RUO) version of the test, but reported that the RUO version contains the same reagents as the 'emergency use authorisation' (EUA) version. The RUO test allows the user to view the amplification curves for the RdRp gene as well as for the E‐gene and N2 targets whereas the EUA version restricts the amplification curves to E and N2 only. ID NOW and SAMBA‐II use isothermal techniques, Xpert Xpress is based on RT‐PCR, and Accula is described as a PCR plus LFA.

In the 13 studies, seven used only nasopharyngeal (n = 6) or nasal (n = 1) swab samples, one used both nasopharyngeal and oropharyngeal swab samples from all patients, and the remaining five evaluations used mixed swab samples including nasopharyngeal or nasal swabs (n = 3), nasopharyngeal or oropharyngeal swabs (n = 1), or multiple sample types including tracheal aspirate (n = 1). One study reported direct swab testing (Harrington 2020), 10 used either swabs in viral transport medium (n = 5), viral transport medium or saline (n = 4), or viral transport medium or gelatin‐lactalbumin‐yeast (GLY) medium (n = 1), and two did not report whether any transport medium was used. Five of 13 studies reported testing immediately (n = 1), or within 48 (n = 1) or 72 hours (n = 3) of sample collection. Four studies reported testing after a period of frozen storage, and four did not describe sample storage or timing of testing at all. Two of the four manufacturers document instructions for use for samples in transport medium (for the Xpert Xpress and SAMBA II assays) and two explicitly recommend against the use of viral transport medium (ID NOW and Accula), although at the time of the test evaluations, some viral transport media were documented as acceptable for ID NOW. Although immediate sample testing is preferred, all manufacturers document acceptable period of refrigerated storage of between 24 hours (ID NOW) and seven days (Xpert Xpress). See Appendix 9.

Across the 23 test evaluations of antigen or molecular tests, only one reported testing outside of a centralised laboratory setting, where direct swab testing (using ID NOW (Abbott Laboratories)) was carried out by on‐site medical personnel or laboratory personnel at local laboratories (Harrington 2020).

Our own assessment of test complexity across test types classified SAMBA II as high complexity (more than two sample preparation steps and more than three test steps), Shenzhen Bioeasy FIA, ID NOW and Accula as moderate complexity and the other antigen tests and Xpert Xpress as low complexity (one sample preparation step and up to two test steps).

Methodological quality of included studies

We report the overall methodological quality assessed using the QUADAS‐2 tool for all included studies (n = 18) in Figure 2 (Whiting 2011). See Appendix 10 for a plot of study‐level ratings by quality.

Figure 2

Risk of bias and applicability concerns graph: review authors' judgements about each domain presented as percentages across included studies. Numbers in the bars indicate the number of studies

We considered the risk of bias in the individual studies and whether the results were likely to be applicable to standard use of the tests. We did not judge any study at low risk of bias and we had concerns about the applicability of results in all studies. We considered risk of bias to be high in nine (50%) studies because of how they selected samples and in 13 (72%) because they considered that one negative RT‐PCR was sufficient to confirm the absence of COVID‐19 infection. Lack of details in reporting meant we could not clearly assess whether there was a risk of bias through performance of the index test in 11 (61%) studies, or from the way in which the study was undertaken and analysed in 10 (56%). We judged that there were high concerns about the applicability of the evidence related to participants in 13 (72%) studies, to the index test in 13 (72%) studies and to the reference standard in 17 (94%) studies. We did not observe differences in methodological quality between antigen and molecular test evaluations. Explanations of how we reached these judgements are given below and in the Characteristics of included studies table.

Participant selection

We judged only two studies to be at low risk of bias, and in seven (39%) the risk was unclear because of poor reporting. The remaining 50% (9/18) we judged to be at high risk of bias because of deliberate sampling of participants based on the reference standard result; two of which also only included samples with confirmed COVID‐19 infection. We were not able to judge the appropriateness of study exclusions (16/18) or inclusions (11/18) where selection was based on the availability of laboratory samples with no participant eligibility criteria specified. Numbers per group are not mutually exclusive.

We had high concerns about the applicability of the selected participants in 13/18 studies (72%), meaning that the participants who were recruited were unlikely to be similar to those in whom the test would be used in clinical practice. This was largely because of the use of deliberate sampling; and sample inclusion based on the availability of residual and sometimes frozen samples, created unrepresentative participant samples. We judged only one study recruiting participants presenting to urgent care or emergency departments as likely to have selected an appropriate patient group.

Index tests

Figure 2 demonstrates similar patterns in risk of bias and applicability of the index test for studies of both antigen and rapid molecular tests. We observed low risk of bias in four studies that clearly described interpretation of the index test blinded to results of the reference standard, and used prespecified test thresholds. There was high risk of bias in three studies because the manufacturer’s prespecified threshold for the Xpert Xpress test (re‐testing of samples with presumptive positive results) was not followed. The risk of bias was unclear in 11 studies because we could not judge whether interpretation of the index test was undertaken with knowledge of whether individuals did or did not have COVID‐19 infection.

Thirteen studies did not carry out testing as it would occur in practice: four studies used trained, centralised laboratory staff and not local laboratory or healthcare personnel; one test could not be purchased (Diao 2020); and 11 because the test was not conducted within the manufacturer instructions for use (these categories are not mutually exclusive). Four studies tested samples in a viral transport medium that was not covered by the manufacturer instructions for use, five used frozen samples, one reported heat inactivation of samples prior to direct testing and two reported a testing timeframe beyond that recommended.

The remaining five studies provided inadequate information to make a judgement; three of them did conduct the test within the manufacturer instructions for use but none of them clearly described the setting for testing or personnel conducting the test.

Reference standards

Only one study used an appropriate reference standard to define the presence or absence of COVID‐19 infection (two negative PCR results required to confirm the absence of COVID‐19) and implemented it in ways that prevented bias (Diao 2020). One additional study reported two RT‐PCR results for all study participants (Moore 2020), and two did not include non‐COVID‐19 cases. We considered that the remaining 14 did not use an adequate reference standard, putting them at high risk of bias (Figure 2). Eight studies reported blinded RT‐PCR interpretation and 10 (56%) provided insufficient information about blinding of the reference standard to the index test to judge risk of bias.

RT‐PCR is unlikely to falsely classify participants as having COVID‐19 (low risk of false positive), but may miss true cases leading to false positives on the index test when a single RT‐PCR alone is used as a reference standard. Four studies (22%) used a second RT‐PCR test for samples with discrepant results (FP and FN) to address this. However, selective re‐testing could miss additional cases of COVID‐19 infection, and is likely to lead to distorted results. One study (Moore 2020), used a second RT‐PCR test in all samples and furthermore carried out a record review for all cases with discrepant results in order to verify whether participants were truly considered to have had COVID‐19 infection.

We judged 17 of the 18 studies to raise concerns for applicability (94%) because of defining the presence of COVID‐19 infection based on a single RT‐PCR‐positive result. These studies will have excluded individuals who are RT‐PCR‐negative but have exposure and clinical features that meet the case definitions for COVID‐19.

Flow and timing

Only three studies were at low risk of bias for participant flow and timing, one (Porte 2020), used a Standards of Reporting Diagnostic Accuracy Studies (STARD)‐style participant flow diagram and checklist (Bossuyt 2015), to fully report outcomes for all samples. Five studies were at high risk of bias because of exclusion of samples following invalid index test results (they did not carry out any retesting).

Unclear risk of bias was present in 10 (56%) studies because of lack of clarity around participant inclusion and exclusion from analyses. Six studies were unclear regarding whether the analysis was participant‐based or sample‐based (where there is a possibility of multiple samples per participant overstating the precision of estimates).

Conflicts of interest

In six studies all authors declared no conflicts of interest, although one study that evaluated an ‘in‐house’ test included a co‐author affiliated to a test manufacturing company. Eight studies did not provide a conflict of interest statement (one of these included co‐authors affiliated to the test manufacturer) and in the four remaining studies at least one author declared conflicts of interest in relation to the test.

Eleven studies provided no funding statement, five reported no funding sources to declare, and two reported one or more public funding sources. Two studies reported receipt of test kits or reagents ‘in kind’ from test manufacturers.

Findings

Of the 18 included studies, three reported evaluations of more than one test using the same samples (Table 1). In order to include all results from all tests in these analyses we have treated results from different tests of the same samples within a study as separate data points, such that data are available on 23 test evaluations (8 evaluations of antigen tests in 5 studies and 15 evaluations of rapid molecular tests in 13 studies). The results table (Table 2), identifies where estimates are based on multiple assessments of the same samples by including both the number of test evaluations and the number of studies. The numbers of true positives, false positives, and total samples with and without confirmed SARS‐CoV‐2 infection are based on test result counts.

Table 2. Summary of analyses of test accuracy

Test	Evaluations (studies)	Samples	Cases	Average sensitivity, % (95% CI)	Average specificity, % (95% CI)
Antigen tests
All	8 (5)	1180	762	56.2 (29.5 to 79.8)	99.5 (98.1 to 99.9)
Subgroup analysis by viral load
High viral load	7 (5)	400	400	93.2 (63.6 to 99.1)	N/A
Low viral load	7 (5)	341	341	32.6 (17.5 to 52.6)	N/A
*Difference* (95% CI)				−60.6 (−83.0 to −38.2), P < 0.001	N/A
Subgroup analysis by test^a
Beijing Savant FIA	1	109	78	16.7 (9.2 to 26.8)	100 (88.8 to 100)
Coris Bioconcept CGIA^b	2	466	226	54.4 (47.9 to 60.8)	99.6 (97.7 to 99.9)
Liming CGIA	1	19	9	0 (0 to 33.6)	90.0 (55.5 to 99.7)
RapiGEN CGIA	1	109	79	62.0 (50.4 to 72.7)	100 (88.4 to 100)
Shenzhen Bioeasy FIA^b	2	238	162	89.5 (83.8 to 93.3)	100 (95.2 to 100)
In‐house FIA	1	239	208	67.8 (61.0 to 74.1)	100 (88.8 to 100)
Subgroup analysis by sample type
Nasopharyngeal only	3	705	434	59.4 (50.7 to 67.5)	99.6 (97.4 to 99.9)
Molecular tests
All studies with 2x2 data	13 (11)	2194	1113	95.2 (86.7 to 98.3)	98.9 (97.3 to 99.5)
All studies^c,d	15 (13)	1244	1244	95.5 (88.5 to 98.4)	N/A
Sensitivity analysis before and after discrepant analysis
Before	4	1280	536	98.2 (87.0 to 99.8)	97.8 (94.8 to 99.1)
After	4	1280	547	99.5 (79.9 to 100)	99.6 (98.7 to 99.9)
Subgroup analyses by viral load
High viral load^b	5 (4)	151	151	100 (97.5 to 100)	N/A
Low viral load	5 (4)	142	142	93.3 (46.7 to 99.6)	N/A
Subgroup analyses by test^a
Abbott – ID NOW	5	1003	496	76.8 (72.9 to 80.3)	99.6 (98.4 to 99.9)
Cepheid – Xpert Xpress	6	919	479	99.4 (98.0 to 99.8)	96.8 (90.6 to 99.0)
*Difference* (95% CI)				22.6 (18.8 to 26.3), P < 0.001	−2.8 (−6.4 to 0.8), P = 0.13
Mesa Biotech – Accula	1	100	50	68.0 (53.3 to 80.5)	100 (92.9 to 100)
DRW – SAMBA II	1	172	88	98.9 (93.8 to 100)	96.4 (89.9 to 99.3)
Direct comparisons by test
Abbott – ID NOW^a,b	2	220	145	79.3 (71.8 to 85.6)	100 (95.2 to 100)
Cepheid – Xpert Xpress^a,b	2	221	146	98.6 (95.1 to 99.8)	97.3 (90.7 to 99.7)
*Difference* (95% CI)^e				19.3 (12.5 to 26.2), P < 0.001	−2.7 (−6.3 to 1.0), P = 0.15
Sample type
Nasopharyngeal only^c	6 (5)	600	343	87.1 (71.6 to 94.7)	100 (98.6 to 100)
CGIA: colloidal gold immunoassay; CI: confidence intervals; FIA: fluorescent immunoassay; DRW: Diagnostics for the Real World

^aSee Appendix 9 for details of product codes, where available (these were not necessarily reported in studies but we obtained them from manufacturer instructions for use documents).
^b2x2 tables combined prior to calculating estimates.
^cSeparate pooling of sensitivity and/or specificity.
^dThis includes two studies that only include COVID‐19 positive cases.
^eTwo‐sample test of proportions.

We undertook analyses separately for antigen tests and for molecular‐based tests. We present results for all analyses in Table 2. Forest plots of study data for the primary analyses are in Figure 3 and Figure 4. Full identification details for all assays are provided in Appendix 8 and Appendix 9); for brevity, the antigen assays are referred to by the manufacturer name. Subgroup analyses according to viral load are in Figure 5 and Figure 6, and rapid molecular test results before and after discrepant analysis are in Figure 7.

Figure 3

Forest plot of studies evaluating antigen tests. Studies grouped by test
(FIA: fluorescence immunoassays; CGIA: colloidal gold‐based immunoassays; NP: nasopharyngeal; OP: oropharyngeal)

Figure 4

Forest plot of studies evaluating rapid molecular tests. Studies grouped by test and sample type
(NP: nasopharyngeal; OP: oropharyngeal; RUO: research use only)

Figure 5

Forest plot of studies evaluating antigen tests according to viral load: high (≤ 25 Ct) versus low viral load (≤ 30 Ct in Diao 2020). Studies grouped by test

Figure 6

Forest plot of studies evaluating rapid molecular tests according to viral load: high (≤ 30 Ct) versus low viral load. Studies grouped by test

Figure 7

Forest plot of studies of molecular tests before and after discrepant analysis. Studies grouped by test
(DRW: Diagnostics for the Real World; RUO: research use only)

Accuracy of antigen tests overall and by test

Average sensitivity across the eight evaluations of antigen tests was 56.2% (95% CI 29.5 to 79.8%), and average specificity 99.5% (95% CI 98.1% to 99.9%; 943 samples, including 596 samples with confirmed SARS‐CoV‐2; Table 2). However, Figure 3 shows considerable heterogeneity in sensitivity, with results across studies ranging from 0% to 94%. The average value should therefore be interpreted with caution as there may be real differences in sensitivity between the test brands. The two assays with lowest sensitivity (Liming Bio‐Products and Beijing Savant assays) do not now appear to be commercially available. Pooled results for the two tests with two studies each suggested higher sensitivity for the Shenzhen Bioeasy FIA (89.5%, 95% CI 83.7% to 93.8%) than the Coris BioConcept CGIA (54.4%, 95% CI 47.7% to 61.0%), but these tests were not evaluated in the same studies and other factors may explain the observed differences. Similar, unknown factors may explain differences between the Shenzhen Bioeasy and Coris BioConcept assays and the other tests for which only single studies were available. Specificities were consistent and high, with point estimates of 99% or 100% in seven evaluations, and one study estimating specificity as 90% but with a 95% confidence interval that included 100%.

Accuracy of rapid molecular tests overall and by test

Average sensitivity and specificity for the 13 rapid molecular test evaluations that included samples with and without SARS‐CoV‐2, were 95.2% (95% CI 86.7% to 98.3%) and 98.9% (95% CI 97.3% to 99.5%; 2255 samples, 1179 with confirmed SARS‐CoV‐2). Adding the two 'cases only' studies made little difference to the average sensitivity (95.5%, 95% CI 88.5% to 98.4%; 1244 cases). We excluded these two studies from further analyses (Broder 2020; Rhoads 2020).

Figure 4 demonstrates heterogeneity in sensitivity estimates (ranging from 68% to 100%), with consistently high specificities (92% to 100%, but with upper limits of 95% CIs of 99% or 100% in every study). Of the four different molecular tests evaluated, two were evaluated in one study each. The sensitivity and specificity of the Accula test were 68.0% (95% CI 53.3% to 80.5%) and 100% (95% CI 92.9% to 100%; 100 samples, 50 with confirmed SARS‐CoV‐2). For SAMBA II, sensitivity and specificity were 98.9% (95% CI 93.8% to 100%) and 96.4% (95% CI 89.9% to 99.3%; 172 samples, 88 with confirmed SARS‐CoV‐2).

The ID NOW and Xpert Xpress tests were evaluated in five studies (1003 samples, 496 with confirmed SARS‐CoV‐2) and six studies (919 samples, 479 with confirmed SARS‐CoV‐2), respectively. Pooled analysis showed the Xpert Xpress test to have higher sensitivity (99.4%, 95% CI 98.0% to 99.8%) in comparison to ID NOW (76.8%, 95% CI 72.9% to 80.3%), a difference of 22.6 (95% CI 18.8 to 26.3) percentage points (Table 2). Whilst the specificity of Xpert Xpress (96.8%, 95% CI 90.6 % to 99.0%) was marginally lower than ID‐NOW (99.6%, 95% CI 98.4% to 99.9%) the difference was of a magnitude that can be explained by chance (difference of −2.8, 95% CI −6.4 to 0.8) percentage points (P = 0.13)). Restricting the analysis to the two studies that compared the two tests in the same patients gave very similar results (difference in sensitivity of 19.3% (95% CI 12.5% to 26.2%) and difference in specificity of −2.7 percentage points (95% CI −6.3 to 1.0), based on 221 samples, 146 with SARS‐CoV‐2; Smithgall 2020 [A]; Zhen 2020 [A])). (This analysis used the two‐sample test of proportions).

Subgroup analyses by sample type

Adequate data for different sample types were available for studies using nasopharyngeal samples only. We observed similar average sensitivity (59.4%, 95% CI 50.7% to 67.5%) and specificity (99.6%, 95% CI 97.4% to 99.9%) for three evaluations of antigen tests (705 samples, 434 with confirmed SARS‐CoV‐2). For six evaluations of molecular tests, average sensitivity appeared lower compared to the overall pooled estimate (87.1%, 95% CI 71.6% to 94.7%) with little change in specificity (Table 2).

Subgroup analyses by viral load

We extracted sensitivity data according to viral load from seven evaluations of antigen tests (three with the assistance of the study authors) and five evaluations of molecular tests. Ct threshold for high viral load was 25 or less for four of the five antigen studies and 30 or less for the remaining antigen evaluation and for all of the molecular assay evaluations. We observed a large difference in sensitivity in the high viral load group (400 with confirmed SARS‐CoV‐2) for antigen tests (difference of 60.6 percentage points (95% CI 38.2, 83.0) compared to low viral load (341 samples with confirmed SARS‐CoV‐2) that was beyond that expected by chance (P < 0.001) (Table 2; Figure 5).

For molecular tests, all sensitivity estimates for the high viral load subgroups were 100% (based on 151 samples with confirmed SARS‐CoV‐2) compared to between 34% and 100% for low viral load subgroups (summary sensitivity 93.3%, 95% CI 46.7% to 99.6%; 142 samples with confirmed SARS‐CoV‐2; Table 2; Figure 6). The evaluations with the lowest sensitivities both evaluated ID NOW, with reported sensitivity estimates of 34% (35 samples with confirmed SARS‐CoV‐2 in Smithgall 2020 [A]), and 58% (based on 31 samples with confirmed SARS‐CoV‐2 in Mitchell 2020). Sensitivity in the three evaluations of Xpert Xpress ranged from 97% (35 samples with confirmed SARS‐CoV‐2 in Smithgall 2020 [B]) to 100% (in Lieberman 2020 and Wolters 2020, with 7 and 34 samples with confirmed SARS‐CoV‐2 respectively).

Sensitivity analysis of the impact of discrepant analysis

Four evaluations of molecular tests (in 1566 samples) reported results before and after discrepant analysis where selected samples were re‐tested with either the same (Harrington 2020; Moran 2020), or an alternative RT‐PCR assay (Assennato 2020; Loeffelholz 2020), three of which also reported re‐testing of samples with the index test (Assennato 2020; Harrington 2020; Moran 2020; Table 3; Figure 7).

Table 3. Effect of sample re‐testing

Study	Index test (target genes)	First RT‐PCR	Target gene	Second RT‐PCR	Target gene	False positives	False negatives	Index test re‐test	Reference standard re‐test
Discrepant analysis
Assennato 2020	SAMBA II (ORF1ab, N2)	PHE Cambridge (Wuhan) assay	RdRp, E gene	PHE Colindale RT‐PCR assay	RdRp 'different region'	3 → 0	1 → 1	Yes; same results obtained	Yes; 3 FPs (reclassified as TP), all borderline positive for ≥ 1 target gene on either RT‐PCR test 1 FN (remained FN), positive on both RT‐PCR assays
Harrington 2020	ID NOW (RdRp)	Abbott RealTime	Not stated	Same RT‐PCR	Same	2 → 0	47 no‐retest	1 FP reclassified as TN with repeat sampling 1 FP not re‐tested	1 FP reclassified as TP 1 FP reclassified as TN (both with repeat sampling)
Loeffelholz 2020	Xpert Xpress (RUO) (E, N2)	RT‐PCR varied by site: New York RT‐PCR assay Quest rRT‐PCR Altona RealStar GeneFinder Seegene Allplex Charité Virology Abbott RealTime DiaSorin Simplexa	By assay N (N1, N2) N (N1, N3) S, E RdRp, E, N RdRp, E, N RdRp RdRp, N ORF1ab, S ORF1ab, S	One of: Hologic Panther Fusion Roche Tib‐Molbiol LightMix CDC assay	By assay: ORF1ab E N1, N2	11 → 3	1 → 0	None reported	1 FN re‐classified as TN (inconclusive positive on Quest assay; negative on CDC assay) 3 FP remained as FP (2 negative on NY assay, 1 negative on Charité Virologie assay; all confirmed negative with Hologic Panther Fusion) 8 FP re‐classified as TP (all negative on Charité Virologie assay; positive on re‐test with Roche Tib Molbiol assay)
Moran 2020	Xpert Xpress (E, N2)	Roche cobas 6800	ORF1, E	Same RT‐PCR	Same	1 → 0	0	1 FP reclassified as TN (was initially E gene negative and low positive for N2; negative for both targets on re‐test)	1 FP 'repeatedly negative' on RT‐PCR re‐test (re‐classified as TN based on index re‐test)
Additional studies reporting sample re‐testing (not discrepant analysis)
Broder 2020	Xpert Xpress (E, N2)	Roche cobas 6800	ORF1a, E	modified CDC protocol	NR	0	1	None reported No presumptive positive results reported	Yes; 1 FN (became TN)
Hogan 2020	Accula (N)	In‐house SHC assay	E gene	N/A	N/A	0	16	Yes; 1 TP remained as TP; faint positive Accula test line was repeated on re‐test	None reported
Lieberman 2020	Xpert Xpress (E, N2)	CDC EUA‐based in‐house test (positive if 1 of 2 targets detected)	NI, N2	N/A	N/A	0	0	Yes; 1 presumptive positive (E‐gene only positive) became positive (N‐gene only positive) on re‐test	None reported
Moore 2020	ID NOW (RdRp)	Modified CDC RT‐PCR	N1, N2	Abbott RealTime	N, RdRp	0 → 0	25 → 31	None reported	All samples tested with both RT‐PCR assays 25 FN remained as FN (2 were inconclusive but considered positive on CDC assay, confirmed positive with Abbott RealTime assay) 6 TN reclassified as FN (negative on CDC assay, confirmed positive with Abbott RealTime assay) All 8 discordant results between the two RT‐PCR's were confirmed SARS‐CoV‐2 positive based on record review
Wolters 2020	Xpert Xpress (E, N2)	In‐house assays at three laboratories	By laboratory: E, RdRp E, RdRp; then E only E, RdRp; then E, N1	Same RT‐PCR per laboratory	Same	0 → 2	0	None reported 1 presumptive positive considered TP by review team	2 TP samples (both positive on only one target; 1 presumptive positive (E positive) and 1 positive (N2 positive)) re‐classified as FP; both considered SARS‐CoV‐2 negative on RT‐PCR re‐test *authors note that viral loads were at the limit of detection for Xpert Xpress and that multiple freeze‐thaw steps of samples could have had a significant impact on detection.
CDC: center for disease control; EUA: emergency use authorisation; FN: false negative; FP: false positive; PHE: Public Health England; RT‐PCR: reverse transcriptase polymerase chain reaction; RUO: research use only; TN: true negative; TP: true positive

Discrepant analysis always works to reduce the number of samples deemed to be false negative or false positive errors. Discrepant analysis reduced the false negative proportion (1‐sensitivity) from 1.8% to 0.5% and the false positive rate (1‐specificity) from 2.2% to 0.4%. Three of the four studies reporting initially ‘false positive’ results reported zero false positives after sample re‐testing and one reported a drop in false positives from 11 to 3 (Loeffelholz 2020; Table 3). One of the two studies reporting re‐testing of initially ‘false negative’ results reported reclassification as true negative on re‐testing, and in the other the false negative remained as a false negative. Given the bias inherent in choosing the reference test dependent on the observed results, we caution against these findings.

An additional study tested all samples with two different RT‐PCR assays, and hence used a more accurate reference standard in all samples, not just samples with discrepant results (Moore 2020), in which six initial true negatives were reclassified as false negatives after the second RT‐PCR. Had discrepant analysis been undertaken these misclassifications would have been missed, further underlining the methodological flaws inherent to discrepant analysis.

Other sources of heterogeneity

We planned to evaluate the effect of other sources of heterogeneity, including study design, reference standard, length and severity of symptoms, and setting. However, additional formal investigations using meta‐regression were not possible because of limited data, lack of reporting or lack of variability across the studies in these features (Appendix 11). Only one study reported the median time to testing after symptom onset, none reported symptom severity, and three reported the setting in which tests were conducted. All studies used RT‐PCR alone as the reference standard for diagnosing COVID‐19 infection.

We anticipate revisiting the effect of study design and including a more detailed investigation by sample type in future iterations of this review.

Discussion

This is the first version of a Cochrane living review summarising the accuracy of point‐of‐care antigen and molecular tests for detecting current SARS‐CoV‐2 infection. This version of the review is based on published studies, or studies available as preprints, up until 25 May 2020. We are continually identifying new published studies, and plan regular updates of this review.

Summary of main results

We included data from 18 studies including 3198 samples (including 1775 samples with confirmed SARS‐CoV‐2). Five studies, reporting eight test evaluations, considered antigen tests and 13 studies, reporting 15 test evaluations, considered rapid molecular tests. Key findings are presented in the summary of findings Table 1.

We summarise five key findings from this review.

A significant proportion of antigen and molecular assays that are suitable for use at the point of care do not have any published or preprint reports of accuracy. This review has evaluated data from five commercial antigen tests, two of which we could not identify as available for purchase, and four molecular assays. These represent a small proportion of assays currently available. We have identified 24 additional studies of rapid antigen or molecular tests published or available as preprints up until 22 June 2020, which we will appraise for inclusion in the review update, but there still remain no published data for the majority of tests on the current FIND list.
The design and execution of studies limits the strength of conclusions that we are currently able to draw, either for antigen or for molecular tests. It is unclear whether the limitations in the primary studies will lead to over‐ or under‐estimates of test accuracy, thus all results we report should be interpreted with a high degree of caution. Half of studies used deliberate sampling based on the presence or absence of confirmed COVID‐19 infection, and the majority selected samples from those submitted to laboratories for routine RT‐PCR testing with little to no detail of the participants who provided the samples in relation to either symptom status or time from symptom onset. It is impossible to determine the effect of inclusion decisions based on the availability of residual or remnant samples. It was not always clear how many samples were included from each participant, and the analysis had to be undertaken on a per‐sample basis, which will have overestimated the precision of the estimates. RT‐PCR was the only reference standard for diagnosing the presence of SARS‐COV‐2 infection so that we are unable to comment on the accuracy of rapid tests for diagnosing infection in those who are RT‐PCR negative but meet case definition criteria for the presence of infection. The use of a second RT‐PCR assay to determine the disease status of samples with discrepant results following rapid molecular testing is likely to introduce further bias.
Three‐quarters of studies conducted tests outside of manufacturers’ instructions for use, particularly in regard to sample storage and use of transport media, and with tests conducted in centralised laboratories rather than at the point of care, so that test accuracy in a clinical setting remains unknown. We considered five tests, including one molecular assay, to have low complexity in terms of minimal sample preparation and test steps, and the other four to have moderate (n = 3) or high (n = 1) complexity, which could also affect how well the observed accuracy translates into practice. We did not include interpretation steps in our assessment of test complexity; however the use of reader devices, for example for FIAs, could be considered to further add to complexity.
On average, the sensitivity of antigen tests was relatively poor (56.2%, 95% CI 29.5 to 79.8%), but with consistently high specificities (average 99.5%, 95% CI 98.1 to 99.9%). However, there is considerable heterogeneity in sensitivities between studies, and with limited data for individual tests. We observed large differences in sensitivity according to viral load and suspect that differences in the distribution of samples with high and low viral load between studies may have affected overall accuracy estimates. Combined with methodological limitations and other unknown factors, it is not possible to state with any certainty whether any test is superior to the others. There is a suggestion of higher sensitivity in two studies of the Shenzhen Bioeasy fluorescent immunoassay (sensitivity 89.5%, 95% CI 83.8%, 93.3%), that was maintained in subgroup analysis by viral load (one of the two obtained over 90% of samples during the first week of symptoms). An additional study reporting the development of this assay reported lower sensitivity overall (68%, 95% CI 61, 74%), however it included a much lower proportion of samples with high viral load (27% compared to 68 to 74% in the other two studies). Subgroup analysis suggested the test performed similarly to the other two studies when restricted to high and low viral load subgroups. All three studies included high percentages of samples with confirmed SARS‐CoV‐2, and more data is needed to determine whether test performance for this assay can be repeated in clinical practice.
On average, the sensitivity for the rapid molecular tests was 95.2% (95%CI 86.7%, 98.3%) with specificity 98.9% (95% CI 97.3, 99.5%). Although the average estimates are based on twice as much data as for the antigen tests, the evaluations are subject to the same methodological limitations, and we do not know how the assays would perform in any specific clinical setting when used in people suspected of having COVID‐19 infection or of having been exposed to a confirmed case.

Most of the evaluations of molecular tests were of ID NOW or Xpert Xpress. Summary sensitivity for Xpert Xpress (99.4%, 95% CI 98.0 to 99.8%) was 22.6 percentage points higher than that of ID NOW, a magnitude of difference that was more or less maintained in the two direct comparisons of the two assays. Concerns over risk of bias would suggest that this high rate of sensitivity might be an over‐estimate. However as both sets of studies have similar methodological limitations, it is probably reasonable to presume that some difference in sensitivity between tests would be maintained if these sources of bias were removed. The difference in specificity between the tests is small (ID NOW being 2.8% more specific compared to Xpert Xpress), but potentially important especially if used in a low‐prevalence setting. However, this would not be an issue should test positives be confirmed by a laboratory‐based RT‐PCR assay. Concerns about the applicability of study participants and index tests brings into question whether similar differences in test performance would be observed in practice.

As stated above, we did not undertake a formal comparison between antigen and molecular assays because of the lack of direct head‐to‐head comparisons of the two test types. However, the possible effect of the observed differences in accuracy can be illustrated by applying the summary estimates of test accuracy to a hypothetical cohort of 1000 people suspected of COVID‐19 infection (summary of findings Table 1). If 100 people had confirmed SARS‐COV‐2 infection (prevalence of 10%), the average sensitivity and specificities of antigen tests mean that 5 of 61 people with a positive test result would be false positives (positive predictive value (PPV) 92%) while 44 of 940 people with negative test results would be falsely negative (negative predictive value (NPV) 95%). As there is high heterogeneity in the estimates of sensitivity, the values observed in practice could vary considerably from these figures. For molecular assays at the same prevalence, 10 of 105 positive test results would be false positive (PPV 90%), and 5 of 895 with negative results would be falsely negative (NPV 99%).

Small decreases (to 5%) or increases (to 20%) in prevalence make little difference to the absolute number of false positive results, but have a large relative effect when considered in relation to the number of positive test results (PPV ranging 85% to 97% for antigen tests and 83% to 95% for molecular assays). The NPV (percentage of negative test results that are truly negative) for the molecular assays is not affected by these prevalence changes in the same way because of the relatively high sensitivity and relatively low‐prevalence scenarios considered. Wider variation is observed for antigen tests (98% to 90%). This shows how even in a low‐prevalence setting, tests with poor sensitivity can have a considerable impact on the level of confidence that can be had in a negative test result. However, we emphasise that these numbers are not based on any evidence comparing antigen and molecular tests in the same samples.

We saw a similar pattern of results when applying summary results for individual tests with wide variations in sensitivity and only small differences in specificities (summary of findings Table 1).

Strengths and weaknesses of the review

Our review used a broad search screening all articles concerning COVID‐19. We undertook all screening and eligibility assessments, QUADAS‐2 assessments (Whiting 2011), and data extraction of study findings independently and in duplicate. Whilst we have reasonable confidence in the completeness and accuracy of the findings up until the search date, should errors be noted please inform us at [email protected] so that we can check and correct in our next update.

We identified one other systematic review of point‐of‐care tests for detection of SARS‐CoV‐2 that is currently available only as a preprint (Subsoontorn 2020). The review did not consider antigen tests or RT‐PCR‐based tests (such as Xpert Xpress), instead focusing on molecular tests that do not require the use of a thermal cycler. We undertook a careful assessment of test complexity to ensure that included tests were suitable for use at the point of care. This assessment included explicit consideration of sample preparation and biosafety requirements as well as time to test result. The application of these index test criteria led to the exclusion of the majority of the 31 RT‐LAMP or CRISPr assay evaluations that were included in Subsoontorn 2020. Evaluations of alternative laboratory‐based molecular technologies are under consideration for inclusion in another review in our series of Cochrane COVID‐19 DTA reviews. An additional seven studies included in Subsoontorn 2020 became available after our search cut‐off and are already under consideration for inclusion in the review update.

Weaknesses of the review primarily reflect the weaknesses in the primary studies and their reporting. Many studies omitted descriptions of participants, and key aspects of study design and execution. In order to include data for all tests in pooled analyses we have had to include some samples multiple times. We have been explicit about these issues where they arose. It is possible that eligible studies have been missed by our search strategy however we believe the risk to be very low considering our broad approach to identification of literature.

Around a quarter (5/18) of the studies we have included are currently only available as preprints, and as yet, have not undergone peer review. As published versions of these studies are identified in the future, we will double‐check study descriptions, methods and findings, and update the review as required.

Applicability of findings to the review question

We have concerns about the applicability of the evidence that we have identified for point‐of‐care tests.

Due to lack of reporting, we do not know whether tests perform in the same way or differently according to whether those being tested have symptoms of COVID‐19, and if so how long they have experienced those symptoms for, or are asymptomatic. Studies appeared to include remnant or residual samples for testing and many selectively included high percentages of samples with RT‐PCR‐confirmed SARS‐CoV‐2. In reality, point‐of‐care tests will be considered for use in much lower prevalence settings. Methodological work on diagnostic test evaluations has shown that independently of prevalence, tests do not necessarily exhibit the same sensitivity and specificity in different prevalence settings (Usher‐Smith 2016). This can be because of differences in the case‐mix or ‘spectrum’ of disease (e.g. viral load). However, the mechanisms in action can be complex and difficult to clearly identify (Leeflang 2013).

We also had concerns about the way in which many of the tests evaluated were performed outside of manufacturer instructions for use, and not in fact at the point of care.

Great caution should be taken in applying these results outside of the individual study contexts.

Figure 1

Study flow diagram

Figure 2

Risk of bias and applicability concerns graph: review authors' judgements about each domain presented as percentages across included studies. Numbers in the bars indicate the number of studies

Figure 3

Figure 4

Forest plot of studies evaluating rapid molecular tests. Studies grouped by test and sample type
(NP: nasopharyngeal; OP: oropharyngeal; RUO: research use only)

Figure 5

Forest plot of studies evaluating antigen tests according to viral load: high (≤ 25 Ct) versus low viral load (≤ 30 Ct in Diao 2020). Studies grouped by test

Figure 6

Forest plot of studies evaluating rapid molecular tests according to viral load: high (≤ 30 Ct) versus low viral load. Studies grouped by test

Figure 7

Forest plot of studies of molecular tests before and after discrepant analysis. Studies grouped by test
(DRW: Diagnostics for the Real World; RUO: research use only)

Figure 8

Risk of bias and applicability concerns summary: review authors' judgements about each domain for each included study

Test 1

Antigen tests ‐ All

Test 2

Antigen tests ‐ high viral load

Test 3

Antigen tests ‐ low viral load

Test 4

Molecular tests ‐ all

Test 5

Molecular tests ‐ all (before discrepant analysis)

Test 6

Molecular tests ‐ all (after discrepant analysis)

Test 7

Molecular tests ‐ high viral load

Test 8

Molecular tests ‐ low viral load

Summary of findings 1. Diagnostic accuracy of point‐of‐care antigen and molecular‐based tests for the diagnosis of SARS‐CoV‐2 infection

Question	What is the diagnostic accuracy of rapid point‐of‐care antigen and molecular‐based tests for the diagnosis of SARS‐CoV‐2 infection?
Population	Adults or children suspected of: current SARS‐CoV‐2 infection or populations undergoing screening for SARS‐CoV‐2 infection, including asymptomatic contacts of confirmed COVID‐19 cases community screening
Index test	Any rapid antigen or molecular‐based test for diagnosis of SARS‐CoV‐2 meeting the following criteria: portable or mains‐powered device minimal sample preparation requirements minimal biosafety requirements no requirement for a temperature‐controlled environment test results available within 2 hours of sample collection
Target condition	Detection of current SARS‐CoV‐2 infection
Reference standard	For COVID‐19 cases: positive RT‐PCR alone or clinical diagnosis of COVID‐19 based on established guidelines or combinations of clinical features For non‐COVID‐19 cases: repeated negative RT‐PCR or pre‐pandemic sources of samples
Action	False negative results mean missed cases of COVID‐19 infection, with either delayed or no confirmed diagnosis and increased risk of community transmission due to false sense of security False positive results lead to unnecessary self‐isolation or quarantine, with the potential for new infection to be acquired
Quantity of evidence	Number of studies		Total samples		Total samples with confirmed SARS‐CoV‐2
Quantity of evidence	18		3198		1775
Limitations in the evidence
Risk of bias	Participants: high or unclear risk in 16 studies (89%) Index test: high or unclear risk in 14 studies (78%) Reference standard: unclear risk in 10 studies (56%) Flow and timing: high or unclear risk in 15 studies (83%)
Concerns about applicability	Participants: high concerns in 13 studies (72%) Index test: high concerns in 13 studies (72%) Reference standard: high concerns in 17 studies (94%)
Findings
Antigen tests
Evaluations (studies)	Samples		Confirmed SARS‐CoV‐2 samples		Average sensitivity (95% CI) [Range]	Average specificity (95% CI) [Range]
8 (5)	943		596		56.2 (29.5 to 79.8) [0% to 94%]^a	99.5 (98.1 to 99.9) [90% to 100%]
Average sensitivity and specificity applied to a hypothetical cohort of 1000 patients^a
Prevalence of COVID‐19	TP	FP	FN	TN	PPV^b	NPV^c
5%	28^a	5	22^a	945	85% (68% to 95%)^a	98% (97% to 99%)
10%	56^a	5	44^a	896	92% (82% to 97%)^a	95% (94% to 97%)^a
20%	112^a	4	88^a	796	97% (91% to 99%)^a	90% (88% to 92%)^a
Rapid molecular tests
Evaluations (studies)	Samples		Confirmed SARS‐CoV‐2 samples		Average sensitivity (95% CI) [Range]	Average specificity (95% CI) [Range]
13 (11)	2255		1179		95.2 (86.7 to 98.3) [68% to 100%]	98.9 (97.3 to 99.5) [92% to 100%]
Average sensitivity and specificity applied to a hypothetical cohort of 1000 patients
Prevalence of COVID‐19	TP	FP	FN	TN	PPV^b(95% CI)	NPV^c(95% CI)
5%	48	10	2	940	83% (71% to 91%)	100% (99% to 100%)
10%	95	10	5	890	90% (83% to 95%)	99% (99% to 100%)
20%	190	9	10	791	95% (92% to 98%)	99% (98% to 99%)
Pooled results for individual tests
Tests	Evaluations		Samples	SARS‐CoV‐2 cases	Sensitivity (95% CI)	Specificity (95% CI)
Shenzhen Bioeasy Ag assay	2		238	162	89.5 (83.7 to 93.8)	100 (95.3 to 100)
ID NOW	5		1003	496	76.8 (72.9 to 80.3)	99.6 (98.4 to 99.9)
Xpert Xpress	6		919	479	99.4 (98.0 to 99.8)	96.8 (90.6 to 99.0)
Average sensitivity and specificity applied to a hypothetical cohort of 1000 patients where 100 have COVID‐19 infection (10% prevalence)
Tests	TP	FP	FN	TN	PPV^b(95% CI)	NPV^c(95% CI)
Shenzhen Bioeasy Ag assay	90	0	11	900	100% (96% to 100%)	99% (98% to 99%)
ID NOW	77	4	23	896	96% (89% to 99%)	97% (96% to 98%)
Xpert Xpress	99	29	1	871	77% (69% to 84%)	100% (99% to 100%)
Ag: antigen;CI: confidence interval; FN: false negative; FP: false positive;NPV: negative predictive value; PPV: positive predictive value; RT‐PCR: reverse transcription polymerase chain reaction; TN: true negative; TP: true positive
^aAs there is high heterogeneity in the estimates of sensitivity, the values observed in practice could vary considerably from these figures. ^bPPV (positive predictive value) defined as the percentage of positive rapid test results that are truly positive according to the reference standard diagnosis. ^cNPV (negative predictive value) defined as the percentage of negative rapid test results that are truly negative according to the reference standard diagnosis.

Summary of findings 1. Diagnostic accuracy of point‐of‐care antigen and molecular‐based tests for the diagnosis of SARS‐CoV‐2 infection

Table 1. Description of studies

Participants		5	13
		Antigen tests	Molecular tests
Overall sample size	Median (IQR)	112 (96 to 198)
	Range	26 to 524
Overall number of SARS‐CoV‐2 positive samples	Median (IQR)	85 (50 to 119)
	Range	13 to 220
Sample size	Median (IQR)	138 (127 to 239)	103 (88 to 172)
	Range	111 to 328	26 to 524
Number of SARS‐CoV‐2 positive samples	Median (IQR)	94 (82 to 132)	58 (46 to 96)
	Range	80 to 208	13 to 220
Setting	Hospital A & E	2 (40%)	1 (8%)
	Mixed	0 (0%)	3 (31%)
	Unclear	3 (60%)	9 (69%)
Patient group	Acute (A&E presentation)	2 (40%)	1 (8%)
	Unclear	3 (60%)	12 (92%)
Study design
Recruitment structure	Single group ‐ sensitivity and specificity	3 (60%)	6 (46%)
	Single group ‐ sensitivity only	0 (0)	2 (15%)
	Two or more groups ‐ sensitivity and specificity	2 (40%)	5 (38%)
Reference standard for presence of SARS‐CoV‐2	All RT‐PCR positive	5 (100%)	13 (100%)
Reference standard for absence of SARS‐CoV‐2	COVID suspects (double RT‐PCR negative)	1 (20%)	0 (0%)
	COVID suspects (single RT‐PCR negative)	4 (80%)	11 (85%)
	Not applicable	0 (0)	2 (15%)
Tests
Number of tests per study	1	4 (80%)	11 (84.6%)
	2	0 (0)	2 (15.4%)
	4	1 (20%)	0 (0)
Test technology, antigen tests only^a	Colloidal‐gold immunoassay	4 (50%)	N/A
	Fluorescent immunoassay	4 (50%)	N/A
Sample type	Nasal only	0 (0%)	1 (8%)
	Nasopharyngeal only	3 (60%)	6 (46%)
	Nasopharyngeal + oropharyngeal combined	2 (40%)	1 (8%)
	Nasopharyngeal or nasal	0 (0)	3 (23%)
	Nasopharyngeal or oropharyngeal	0 (0)	1 (8%)
	Mixed (3 or more types)	0 (0)	1 (8%)
IQR: interquartile range; RT‐PCR: reverse transcriptase polymerase chain reaction
^aAs a % of antigen test evaluations (n = 8).

Table 1. Description of studies

Table 2. Summary of analyses of test accuracy

Test	Evaluations (studies)	Samples	Cases	Average sensitivity, % (95% CI)	Average specificity, % (95% CI)
Antigen tests
All	8 (5)	1180	762	56.2 (29.5 to 79.8)	99.5 (98.1 to 99.9)
Subgroup analysis by viral load
High viral load	7 (5)	400	400	93.2 (63.6 to 99.1)	N/A
Low viral load	7 (5)	341	341	32.6 (17.5 to 52.6)	N/A
*Difference* (95% CI)				−60.6 (−83.0 to −38.2), P < 0.001	N/A
Subgroup analysis by test^a
Beijing Savant FIA	1	109	78	16.7 (9.2 to 26.8)	100 (88.8 to 100)
Coris Bioconcept CGIA^b	2	466	226	54.4 (47.9 to 60.8)	99.6 (97.7 to 99.9)
Liming CGIA	1	19	9	0 (0 to 33.6)	90.0 (55.5 to 99.7)
RapiGEN CGIA	1	109	79	62.0 (50.4 to 72.7)	100 (88.4 to 100)
Shenzhen Bioeasy FIA^b	2	238	162	89.5 (83.8 to 93.3)	100 (95.2 to 100)
In‐house FIA	1	239	208	67.8 (61.0 to 74.1)	100 (88.8 to 100)
Subgroup analysis by sample type
Nasopharyngeal only	3	705	434	59.4 (50.7 to 67.5)	99.6 (97.4 to 99.9)
Molecular tests
All studies with 2x2 data	13 (11)	2194	1113	95.2 (86.7 to 98.3)	98.9 (97.3 to 99.5)
All studies^c,d	15 (13)	1244	1244	95.5 (88.5 to 98.4)	N/A
Sensitivity analysis before and after discrepant analysis
Before	4	1280	536	98.2 (87.0 to 99.8)	97.8 (94.8 to 99.1)
After	4	1280	547	99.5 (79.9 to 100)	99.6 (98.7 to 99.9)
Subgroup analyses by viral load
High viral load^b	5 (4)	151	151	100 (97.5 to 100)	N/A
Low viral load	5 (4)	142	142	93.3 (46.7 to 99.6)	N/A
Subgroup analyses by test^a
Abbott – ID NOW	5	1003	496	76.8 (72.9 to 80.3)	99.6 (98.4 to 99.9)
Cepheid – Xpert Xpress	6	919	479	99.4 (98.0 to 99.8)	96.8 (90.6 to 99.0)
*Difference* (95% CI)				22.6 (18.8 to 26.3), P < 0.001	−2.8 (−6.4 to 0.8), P = 0.13
Mesa Biotech – Accula	1	100	50	68.0 (53.3 to 80.5)	100 (92.9 to 100)
DRW – SAMBA II	1	172	88	98.9 (93.8 to 100)	96.4 (89.9 to 99.3)
Direct comparisons by test
Abbott – ID NOW^a,b	2	220	145	79.3 (71.8 to 85.6)	100 (95.2 to 100)
Cepheid – Xpert Xpress^a,b	2	221	146	98.6 (95.1 to 99.8)	97.3 (90.7 to 99.7)
*Difference* (95% CI)^e				19.3 (12.5 to 26.2), P < 0.001	−2.7 (−6.3 to 1.0), P = 0.15
Sample type
Nasopharyngeal only^c	6 (5)	600	343	87.1 (71.6 to 94.7)	100 (98.6 to 100)
CGIA: colloidal gold immunoassay; CI: confidence intervals; FIA: fluorescent immunoassay; DRW: Diagnostics for the Real World
^aSee Appendix 9 for details of product codes, where available (these were not necessarily reported in studies but we obtained them from manufacturer instructions for use documents). ^b2x2 tables combined prior to calculating estimates. ^cSeparate pooling of sensitivity and/or specificity. ^dThis includes two studies that only include COVID‐19 positive cases. ^eTwo‐sample test of proportions.

Table 2. Summary of analyses of test accuracy

Table 3. Effect of sample re‐testing

Study	Index test (target genes)	First RT‐PCR	Target gene	Second RT‐PCR	Target gene	False positives	False negatives	Index test re‐test	Reference standard re‐test
Discrepant analysis
Assennato 2020	SAMBA II (ORF1ab, N2)	PHE Cambridge (Wuhan) assay	RdRp, E gene	PHE Colindale RT‐PCR assay	RdRp 'different region'	3 → 0	1 → 1	Yes; same results obtained	Yes; 3 FPs (reclassified as TP), all borderline positive for ≥ 1 target gene on either RT‐PCR test 1 FN (remained FN), positive on both RT‐PCR assays
Harrington 2020	ID NOW (RdRp)	Abbott RealTime	Not stated	Same RT‐PCR	Same	2 → 0	47 no‐retest	1 FP reclassified as TN with repeat sampling 1 FP not re‐tested	1 FP reclassified as TP 1 FP reclassified as TN (both with repeat sampling)
Loeffelholz 2020	Xpert Xpress (RUO) (E, N2)	RT‐PCR varied by site: New York RT‐PCR assay Quest rRT‐PCR Altona RealStar GeneFinder Seegene Allplex Charité Virology Abbott RealTime DiaSorin Simplexa	By assay N (N1, N2) N (N1, N3) S, E RdRp, E, N RdRp, E, N RdRp RdRp, N ORF1ab, S ORF1ab, S	One of: Hologic Panther Fusion Roche Tib‐Molbiol LightMix CDC assay	By assay: ORF1ab E N1, N2	11 → 3	1 → 0	None reported	1 FN re‐classified as TN (inconclusive positive on Quest assay; negative on CDC assay) 3 FP remained as FP (2 negative on NY assay, 1 negative on Charité Virologie assay; all confirmed negative with Hologic Panther Fusion) 8 FP re‐classified as TP (all negative on Charité Virologie assay; positive on re‐test with Roche Tib Molbiol assay)
Moran 2020	Xpert Xpress (E, N2)	Roche cobas 6800	ORF1, E	Same RT‐PCR	Same	1 → 0	0	1 FP reclassified as TN (was initially E gene negative and low positive for N2; negative for both targets on re‐test)	1 FP 'repeatedly negative' on RT‐PCR re‐test (re‐classified as TN based on index re‐test)
Additional studies reporting sample re‐testing (not discrepant analysis)
Broder 2020	Xpert Xpress (E, N2)	Roche cobas 6800	ORF1a, E	modified CDC protocol	NR	0	1	None reported No presumptive positive results reported	Yes; 1 FN (became TN)
Hogan 2020	Accula (N)	In‐house SHC assay	E gene	N/A	N/A	0	16	Yes; 1 TP remained as TP; faint positive Accula test line was repeated on re‐test	None reported
Lieberman 2020	Xpert Xpress (E, N2)	CDC EUA‐based in‐house test (positive if 1 of 2 targets detected)	NI, N2	N/A	N/A	0	0	Yes; 1 presumptive positive (E‐gene only positive) became positive (N‐gene only positive) on re‐test	None reported
Moore 2020	ID NOW (RdRp)	Modified CDC RT‐PCR	N1, N2	Abbott RealTime	N, RdRp	0 → 0	25 → 31	None reported	All samples tested with both RT‐PCR assays 25 FN remained as FN (2 were inconclusive but considered positive on CDC assay, confirmed positive with Abbott RealTime assay) 6 TN reclassified as FN (negative on CDC assay, confirmed positive with Abbott RealTime assay) All 8 discordant results between the two RT‐PCR's were confirmed SARS‐CoV‐2 positive based on record review
Wolters 2020	Xpert Xpress (E, N2)	In‐house assays at three laboratories	By laboratory: E, RdRp E, RdRp; then E only E, RdRp; then E, N1	Same RT‐PCR per laboratory	Same	0 → 2	0	None reported 1 presumptive positive considered TP by review team	2 TP samples (both positive on only one target; 1 presumptive positive (E positive) and 1 positive (N2 positive)) re‐classified as FP; both considered SARS‐CoV‐2 negative on RT‐PCR re‐test *authors note that viral loads were at the limit of detection for Xpert Xpress and that multiple freeze‐thaw steps of samples could have had a significant impact on detection.
CDC: center for disease control; EUA: emergency use authorisation; FN: false negative; FP: false positive; PHE: Public Health England; RT‐PCR: reverse transcriptase polymerase chain reaction; RUO: research use only; TN: true negative; TP: true positive

Table 3. Effect of sample re‐testing

Table Tests. Data tables by test

Test	No. of studies	No. of participants
1 Antigen tests ‐ All Show forest plot	8	1180

2 Antigen tests ‐ high viral load Show forest plot	7	400

3 Antigen tests ‐ low viral load Show forest plot	7	341

4 Molecular tests ‐ all Show forest plot	15	2325

5 Molecular tests ‐ all (before discrepant analysis) Show forest plot	4	1280

6 Molecular tests ‐ all (after discrepant analysis) Show forest plot	4	1280

7 Molecular tests ‐ high viral load Show forest plot	5	151

8 Molecular tests ‐ low viral load Show forest plot	5	142

Table Tests. Data tables by test