Rapid antigen detection test for group A streptococcus in children with pharyngitis

Summary of findings Summary of findings table

Review questions	What is the diagnostic accuracy of rapid antigen detection tests (RADT) for detecting group A streptococcus (GAS)? What is the relative diagnostic accuracy of the two major types of RADTs (enzyme immunoassays (EIA) and optical immunoassays (OIA))?
Patients/population	Children with acute pharyngitis
Prior testing	Physical examination establishing the diagnosis of pharyngitis, with or without evaluating the likelihood of a streptococcal origin
Settings	Ambulatory care settings: mainly private offices, emergency departments and walk‐in clinics
Index tests	EIA and OIA test for GAS
Reference standard	Throat culture on a blood agar plate
Importance	Compared with culture, RADTs offer diagnosis at the point of care. Whether negative RADTs should be backed up by throat culture depends mainly on the reported sensitivity of the test
Studies	Cross‐sectional studies
Quality concerns	Methodological quality was generally poor, but quality appraisal was impeded by suboptimal reporting. Patient selection and reference standard methods were common risk of bias concerns (in 73% and 43% of test evaluations, respectively)
Heterogeneity	There was substantial heterogeneity in the results of the individual studies, especially for sensitivity, which could not be explained by the investigations
	Quantity of evidence		Average diagnostic accuracy		Consequences in a cohort of 1000 patients…
	Studies (n)	Participants (n)	Sensitivity (95% CI)	Specificity (95% CI)	…given 20% prevalence of GAS cases?	…given 30% prevalence of GAS cases?	…given 40% prevalence of GAS cases?
RADT for the diagnosis of GAS pharyngitis in children (EIA and OIA tests)	105	58,244	85.6% (83.3 to 87.6)	95.4% (94.5 to 96.2)	200 children will have a positive culture for GAS. Of these, 171 will be identified (TP); 29 will be missed (FN). Of the 800 children without GAS, 763 will not be treated (TN); 37 may receive unnecessary antibiotics (FP)	300 children will have a positive culture for GAS. Of these, 257 will be identified (TP); 43 will be missed (FN). Of the 700 children without GAS, 668 will not be treated (TN); 32 may receive unnecessary antibiotics (FP)	400 children will have a positive culture for GAS. Of these, 342 will be identified (TP); 58 will be missed (FN). Of the 600 children without GAS, 572 will not be treated (TN); 28 may receive unnecessary antibiotics (FP)
Comparison of EIA versus OIA tests
EIA tests	86	48,808	85.4% (82.7 to 87.8)	95.8% (94.8 to 96.6)	Interpretation: EIA and OIA tests seem to have comparable accuracy (P value = 0.23)
OIA tests	19	9436	86.2% (82.7 to 89.2)	93.7% (91.5 to 95.4)
CI: confidence interval EIA: enzyme immunoassay FN: false negative FP: false positive GAS: group A streptococcus OIA: optical immunoassay RADT: rapid antigen detection test TN: true negative TP: true positive

Background

Target condition being diagnosed

Pharyngitis is defined as an acute inflammation of the pharynx, tonsils or both. A sore throat is the most common symptom of pharyngitis. The terms 'pharyngitis', 'tonsillitis' and 'sore throat' are often used interchangeably. In this review, the more general term 'pharyngitis' is used. Viruses are the most common cause of pharyngitis but the bacterium most frequently identified during acute pharyngitis is Streptococcus pyogenes (S. pyogenes), also known as group A β‐haemolytic streptococcus (GAS). GAS is estimated to account for 20% to 40% of cases of pharyngitis in children and 5% to 15% in adults (Shaikh 2010; Wessels 2011). The estimated number of cases of GAS pharyngitis in children is 450 million/year worldwide (Carapetis 2005a). Most cases are benign and self limiting within a week but suppurative complications (cervical lymphadenitis, retropharyngeal abscess, peritonsillar cellulitis or abscess (quinsy), sinusitis, acute otitis media and mastoiditis) or non‐suppurative post‐streptococcal diseases (acute rheumatic fever and rheumatic heart disease, acute glomerulonephritis, Sydenham’s chorea, scarlet fever, streptococcal toxic shock syndrome and paediatric autoimmune neuropsychiatric disorder associated with group A streptococci) can occur (Gerber 2005; Shulman 2009).

Acute rheumatic fever is an autoimmune disorder resulting from infection with group A streptococcus, in which heart valves may be severely damaged (rheumatic heart disease). In low‐income countries, rheumatic heart disease remains the most commonly acquired heart disease in children, adolescents and young adults: a recent estimate of the number of deaths from rheumatic heart disease is 233,000 per year worldwide (Carapetis 2005a). In high‐income countries, acute rheumatic fever and rheumatic heart disease are rare (e.g., ≤ 10 cases/year/100,000 children for acute rheumatic fever) (Carapetis 2005b; Seckeler 2011), because of improvements in living conditions, hygiene, increased antibiotic usage, increased access to primary care providers and changes in GAS epidemiology (Carapetis 2007). In the US, about 50% to 70% of the visits by children with pharyngitis result in antibiotic agents being prescribed (Linder 2005). As a result, the public health goal is shifting from preventing rare GAS complications to minimising inappropriate use of antibiotics.

Index test(s)

Simple rapid antigen detection tests (RADTs) were developed in the 1980s to provide an immediate indication for the clinician about the presence or absence of GAS in children with pharyngitis. RADTs do not require any special equipment and can be performed at the point of care with a throat swab (Gerber 2004). They can provide immediate results and are calibrated to produce binary results (positive or negative).

All available RADTs involve the detection of the Lancefield group A carbohydrate, a GAS‐specific cell‐wall antigen. Different immunologic techniques are available for carbohydrate detection (Gerber 2004); from older to most recent:

Latex agglutination (LA) assay: the sample is placed in the presence of latex beads coupled with GAS‐specific antibodies; the result is determined by observing the agglutination of the beads if they are related to the specific antigen in the sample. These first‐generation tests are no longer used in clinical practice and were not considered in this review.
Enzyme immunoassay (EIA): the sample is placed at the end of a nitrocellulose strip and then migrates to an area where it forms an antigen‐antibody complex. These second‐generation tests are also known as immunochromatographic, sandwich or lateral‐flow assays. They are the most widespread and most used RADTs in clinical practice.
Optical immunoassay (OIA): the sample is placed on a silicon membrane in the presence of the reagent. The result is based on the change in optical properties of the inert membrane in the presence of an antigen‐antibody complex. These third‐generation tests seem to be more sensitive than EIAs but their use is limited because of their high cost.

Clinical pathway

Many experts recommend the prescription of antibiotics for children with GAS‐suspected or GAS‐proven pharyngitis (Matthys 2007). The goal of antibiotic treatment is to reduce the individual risk of suppurative or non‐suppurative complications, the duration of symptoms and the spread of the condition (Spinks 2013). Correct identification of GAS ensures against missing GAS‐positive cases that can lead to complications. The correct exclusion of GAS ensures against unnecessary use of antibiotics (thus reducing the incidence of adverse drug reactions, antibiotic resistance and associated costs).

There is a lack of consensus on the most suitable diagnostic method for GAS in children with pharyngitis and the 'standard' diagnostic practice varies greatly amongst countries. The signs and symptoms of GAS and viral pharyngitis overlap broadly (Shaikh 2011), therefore most guidelines that recommend antibiotic treatment of GAS also recommend confirmation of the presence of GAS on the basis of a throat swab (Matthys 2007). However, throat swabs are explicitly not recommended in some countries (e.g., the United Kingdom, Belgium and the Netherlands) (Matthys 2007). International discrepancies might be explained by academic reasons and 'clinical traditions', different targets of sensitivity and specificity because of local epidemiological differences (i.e., rheumatic fever and rheumatic heart disease prevalence), international differences in health systems and policies, and the sparseness of recent data on the incidence of GAS complications and the efficacy of antibiotic treatment for their prevention.

The standard criterion for the diagnosis of GAS in children with pharyngitis is a throat culture on a blood agar plate in a microbiology laboratory (AAP 2012). The major advantage of laboratory throat culture is its detection of GAS from swabs with a very low number of bacteria, but the major limitation is the 48‐hour delay in obtaining results. In addition, throat cultures cannot distinguish true GAS infection from GAS carriage with intercurrent viral pharyngitis. Asymptomatic pharyngeal GAS carriage is usually defined as positive throat culture results for GAS without a GAS‐specific immune response (anti‐streptolysin O and anti‐DNase B antibodies) (Tanz 2007). Asymptomatic GAS carriage occurs in 10% to 15% of healthy children (Shaikh 2010), and does not require antibiotic treatment (Tanz 2007). Agreement is lacking on the most suitable culture technique for diagnosing GAS in children with pharyngitis. Several parameters are likely to affect the sensitivity of the test (culture medium, atmosphere of incubation, duration of incubation, group A identification technique and the number of plates inoculated) (Kellogg 1990; Tanz 1997). These variables affect the diagnostic accuracy of the throat culture and thus the diagnostic accuracy of RADTs as compared to throat culture.

RADTs are widely used for diagnosing GAS pharyngitis at the point of care. In children, the reported sensitivity of RADTs is about 85% (Gerber 2004), but varies greatly amongst studies (from 66% (Van Limbergen 2006) to 99% (Harbeck 1993)), and the specificity is high and stable, about 95% (Gerber 2004). Due to this high specificity, most experts agree on prescribing antibiotics with positive RADT results, even if RADTs cannot differentiate GAS true infection from GAS carriage. However, the consequences of a negative RADT result depend on national guidelines. North American guidelines recommend backing up negative RADT results with throat culture to avoid not treating RADT false‐negative cases (Gerber 2009; Shulman 2012), but most recent European guidelines recommend relying on negative RADT results without culture confirmation (Pelucchi 2012). In low‐income countries, the clinical consequences of RADT results might be the same as in high‐income countries (treat RADT‐positive cases only) but resources for testing might be limited and practices may vary from generalised empiric antibiotic treatment to selective antibiotic treatment or selective rapid testing based on clinical scoring systems (Joachim 2010; Steinhoff 2005; WHO 1995).

Alternative test(s)

Office culture

Another test for the diagnosis of GAS in children with pharyngitis is a throat culture performed in the physician's office (office culture). Office culture has the same disadvantage as a laboratory culture (a 48‐hour delay in obtaining results), with the major limitation being insufficient sensitivity (from 50% to 85%) (Battle 1971; Mondzac 1967; Rosenstein 1970; Tanz 2009; Wegner 1992). Office culture is almost completely abandoned and was not considered in this review.

Streptococcal antibody tests

Assessment of GAS‐specific antibodies is the traditional reference test to differentiate true GAS infection and GAS carriage. The most commonly used GAS‐specific antibody assays tests are for anti‐streptolysin O and anti‐DNase B antibodies. Increased antibody titre assessment diagnoses true GAS infection better than a single absolute titre assessment (Gerber 1986b; Johnson 2010). Streptococcal antibody tests are not used for the diagnosis of GAS in children with pharyngitis because of the need for repeat blood samples. Moreover, the information about the kinetics of the immune response to GAS in children with pharyngitis is very limited and the most recent data show that the interpretation of streptococcal antibody test results is not straightforward (Johnson 2010). Therefore, their use is usually limited to documenting recent GAS infection in patients suspected of having GAS non‐suppurative complications or to epidemiologic studies (Gerber 1986b; Johnson 2010).

Clinical scoring systems

Clinical scoring systems have been developed to diagnose GAS on clinical grounds. The most popular of these scores are the Centor score (Centor 1981) and the McIsaac score (McIsaac 1998). The scores are based on assessing simple clinical criteria (history of fever, cough, tonsillar swelling or exudate, tender cervical adenopathy and age). Their use is recommended in adults but might be inappropriate in children; several authors have reported a lack of diagnostic accuracy in this population (Cohen 2012; Cohen 2015; Fischer Walker 2006; Shaikh 2011). Clinical scoring systems were not considered in this review.

Rapid molecular biology assays

Rapid molecular biology assays for GAS in children with pharyngitis have been recently developed (Group A Streptococcus Direct Test; GenProbe Inc., San Diego, CA; and LightCycler Strep‐A assay; Roche Applied Science, Indianapolis, IN) (Chapin 2002; Heelan 1996; Pokorski 1994; Uhl 2003). These techniques, based on DNA‐rRNA hybridisation or polymerase chain reaction (PCR), are highly sensitive but are not currently used widely because of their cost, the need for highly specialised equipment and personnel, and the two‐hour delay in results (Gerber 2004). Molecular assays are not antigen‐detection tests and were not considered in this review.

Rationale

Childhood pharyngitis is a significant public health problem with, on the one hand, suppurative and non‐suppurative complications of GAS pharyngitis (especially acute rheumatic fever and rheumatic heart disease) and, on the other, costly diagnostic tests and unnecessary antibiotics. RADTs for GAS are now widely available and their use in children with pharyngitis might increase accurate diagnosis and reduce antibiotic consumption.

According to local clinical guidelines, RADTs may be used as stand‐alone diagnostic tests in replacement of throat culture (e.g., in contexts where throat culture is unavailable or not used), or as triage tests, with negative results being supported by a throat culture. These international discrepancies might be explained in part by persistent gaps in knowledge regarding the diagnostic accuracy of RADTs:

What is the accuracy of RADTs for GAS in children with pharyngitis compared to the most consensual reference test (throat culture on a blood agar plate)?

Are there significant differences in diagnostic accuracy between EIAs and OIAs?

Which study‐level factors could explain variations in diagnostic accuracy across clinical studies?

We did not address in this review the questions of whether RADTs should be performed in all patients presenting with signs and symptoms of pharyngitis or only in selected patients on the basis of a clinical score (selective testing strategies), and whether clinical protocols that incorporate RADTs are sufficient to reduce antibiotic prescription. We aimed to provide information to help clinicians and public health decision makers better define the precise role of RADTs in the diagnosis of GAS in children with pharyngitis on the basis of unbiased evidence.

Objectives

Secondary objectives

To assess the relative diagnostic accuracy of EIA and OIA tests by indirect and direct comparison.

Methods

Criteria for considering studies for this review

Types of studies

We included reports of cross‐sectional studies reporting the diagnostic accuracy of one or more RADTs for the diagnosis of GAS in children with pharyngitis, with laboratory throat culture as the reference standard. Reports of randomised controlled trials (RCTs) were also eligible if we could extract 2 x 2 tables for children. Reports of studies in which throat culture was selectively performed in participants with a positive or negative RADT result were included in the review but excluded from the meta‐analysis of sensitivity and specificity estimates.

Participants

We included reports of studies of children (age ≤ 21 years, according to the upper limit used by the American Academy of Pediatrics) seeking ambulatory medical care because of a sore throat or with a diagnosis of pharyngitis, who provided a throat swab for a RADT and laboratory throat culture. In this review, ambulatory care settings included private physicians' offices (general practitioners and paediatricians), walk‐in clinics, hospital outpatient clinics, emergency departments and family medicine centres; we excluded studies performed by specialised physicians (e.g., ear, nose and throat specialists).

We also included reports of studies with only a subgroup of participants eligible for inclusion in the review, provided that we could extract relevant data specific to that subgroup. Reports of studies were not excluded on the basis of whether studies were performed in high‐income or low‐income countries because no data exist to support variations in the accuracy of RADTs according to this criterion.

Index tests

We included only studies of EIA or OIA RADTs for GAS in children with pharyngitis, including those no longer marketed.

Target conditions

GAS in children with pharyngitis (dichotomous).

Reference standards

Studies were required to diagnose GAS with throat culture on a blood agar plate in a microbiology laboratory used as the reference test. Several parameters may affect the accuracy of throat culture. For studies involving more than one throat culture technique (different medium, duration or atmosphere of incubation), we a priori chose to extract data related to the culture technique recommended by a panel of North American content experts, i.e., simple blood agar plate (versus selective or enriched media), incubation 48 hours total (versus 18 to 24 hours only), aerobic atmosphere (versus other) (Shulman 2000), in order to avoid data‐driven approaches.

Search methods for identification of studies

Electronic searches

We searched MEDLINE via Ovid (1980 to May week 5, 2013) using the search strategy described in Appendix 1. The search strategy was developed in consultation with a medical librarian and the Trials Search Co‐ordinator for the Acute Respiratory Infections Group and was adapted to search EMBASE via Elsevier (1980 to June 2013) (Appendix 2) and Web of Science (1980 to June 2013) (Appendix 3). We did not use any filter related to age because many RADT studies enrol adults and children and could provide extractable data for children. We did not use methodological filters to identify diagnostic studies because such filters may result in omission of relevant studies (Leeflang 2006; Whiting 2011b). The searches were run from 1980 onwards because RADTs were not available prior to this date. We searched the Cochrane Central Register of Controlled Trials (CENTRAL) for relevant studies.

We searched the following databases to identify potentially relevant studies referenced in reviews and guidelines:

the Cochrane Database of Systematic Reviews (2013, Issue 5);
DARE (Database of Abstracts of Reviews of Effects) (2013, Issue 2 of 4);
the MEDION database (for Systematic Reviews of Diagnostic Studies) (23 May 2013); and
TRIP (Turning Research Into Practice) (23 May 2013).

We also searched Conference Proceedings Citation Index (CPCI) and SCI‐Expanded for conference proceedings and abstracts. The literature search was updated by the Trials Search Co‐ordinator for the Acute Respiratory Infections Group on 7 July 2015.

Searching other resources

We handsearched reference lists of included articles and relevant review articles identified through the search and the ‘related articles’ function in PubMed (20 first related articles of each included article) for eligible articles. We used Google Scholar to search for reports that cited included articles. We contacted manufacturers of the most common RADTs to seek additional or unpublished studies. Manufacturers included Abbott, Beckman Coulter, Becton Dickinson, Genzyme, Inverness Medical, Polymedco and Quidel.

Data collection and analysis

Selection of studies

We considered studies published in any language. Two review authors (JFC, NB) independently excluded studies that were not related to pharyngitis or RADT on the basis of the titles and abstracts identified by the search strategy. Two review authors (JFC, NB) retrieved the full text of relevant articles and independently evaluated them for inclusion by using a pro forma as a guide. One review author (MC) acted as arbiter in case of discrepancies between two review authors (JFC, NB) who discussed the inclusion of the studies.

We selected the most recent or most complete report in cases of multiple reports for a given study or when we could not exclude the possibility of overlapping populations. We produced a flowchart to report the search process. We reported reasons for excluding studies but we did not report their references.

Data extraction and management

We extracted the number of true positives, true negatives, false positives and false negatives for each index test evaluated in each study to construct 2 x 2 tables. If such data were not provided by the trial authors, we calculated the number of true positives, true negatives, false positives and false negatives from the summary estimates of sensitivity and specificity of the index test, if available. For studies for which only a subgroup of patients were included in the review, we extracted, analysed and presented data for this subgroup only. If some data were unclear or missing, we attempted to contact study authors to obtain additional data.

Two authors (JFC, NB) independently extracted the data used for study quality assessment and statistical analysis (data from 2 x 2 tables and covariates used for investigations of heterogeneity) and resolved discrepancies by discussion until a consensus was reached; other descriptive data were extracted by one review author (JFC). See Table 1 for a description of which data were extracted for each study. Non‐English language reports were not translated: for reports in French, Italian, Spanish and German, members of our team extracted data; for other languages, the Cochrane Acute Respiratory Infections Group identified collaborators who kindly agreed to extract the data.

Table 1. Data extracted from each study

Study ID	First author, year of publication
Type of study	Journal article or conference abstract
Clinical features and settings	Presenting signs and symptoms
	Clinical selection of patients (none, clinical score, explicit criteria but not a score, implicit criteria)
	Exclusion if antibiotics use before inclusion (yes/no)
	Clinical setting (office‐based, emergency department, walk‐in clinic, mixed, other)
	Single‐ or multi‐centre study
	Age range for inclusion
Participants	Sample size (n)
	Age (distribution)
	GAS prevalence according to culture (with 95% confidence interval)
	Country of study
	Sex (% of girls)
	Clinical severity assessment (Centor score, McIsaac score, other, none)
Study design	Cross‐sectional study or RCT
	Retrospective or prospective design
	Sample (consecutive, random or unclear)
	Direct comparison of different RADTs (yes/no)
	Direct comparison of several throat culture techniques (yes/no)
	Throat swab (1 single, 1 double, 2 different)
	Person performing the throat sample (physician, nurse, laboratory personnel, other)
Reference standard(s)	Throat culture medium (standard, enrichment, inhibitory)
	Atmosphere of incubation (aerobic, aerobic with CO₂ enrichment, anaerobic)
	Duration of incubation (≤ 24, 24 to 48, ≥ 48 hours)
	GAS confirmation (bacitracin disk, latex test, other, none)
	Number of plates inoculated (n)
	Assessment of GAS antibody response (yes/no)
	Relevant details
Index tests	Commercial name of the RADT
Index tests	Type of RADT (EIA, OIA)
Data	Number of true positives, false positives, true negatives, false negatives and undetermined/uninterpretable results
Notes	Source of funding (whether any of the authors is affiliated with the manufacturer of the RADT, the study was directly funded by the manufacturer, authors reported conflicts of interests related to the manufacturer or other funding sources)
Notes	Anything else of relevance

RADT: rapid antigen detection test
EIA: enzyme immunoassay
OIA: optical immunoassay
CO₂: carbon dioxide

Assessment of methodological quality

Methodological quality assessment involved use of a four‐domain tool adapted from QUADAS‐2 (Whiting 2011a). Two review authors (JFC, NB) independently collected the information needed to assess the methodological quality of each study using signalling questions (yes/no/unclear). We resolved disagreements on the signalling questions by discussion with a third author (MC) until a consensus was reached. One author (JFC) used this information to judge the risk of bias and concerns about applicability using pre‐defined rules. We tailored the quality assessment tool to our review question. We developed review‐specific guidance on how to assess each signalling question and how to use this information to judge the risk of bias and applicability. We refined the tool until satisfactory inter‐rater agreement on signalling questions was achieved. We summarised the methodological quality assessment in tables. See Table 2.

Table 2. Methodological quality assessment table for each study

Domain 1: Patient selection
Was a consecutive or random sample of patients enrolled?	Yes, No or Unclear
Was it a cross‐sectional study or a RCT?	Yes, No or Unclear
Were selection criteria clearly described (at least presenting signs and symptoms and age limits for inclusion)?	Yes, No or Unclear
Were patients seen in an ambulatory care setting?	Yes, No or Unclear
Was clinical selection of patients avoided?	Yes, No or Unclear
Could the selection of patients have introduced bias?	Risk: Low, High or Unclear
Is there concern that the included patients do not match the review question?	Concern: Low, High or Unclear
Domain 2: RADT (index test)
Were RADTs conducted during consultation time?	Yes, No or Unclear
Were the RADT results interpreted with blinding of the results of culture?	Yes, No or Unclear
Was the type of the RADT mentioned (EIA or OIA)?	Yes, No or Unclear
Could the conduct or interpretation of the RADT have introduced bias?	Risk: Low, High or Unclear
Is there concern that the RADT, its conduct or interpretation differ from the review question?	Concern: Low, High or Unclear
Domain 3: Throat culture (reference standard)
Were culture results interpreted with blinding of the results of the RADT?	Yes, No or Unclear
Is the throat culture method likely to correctly identify GAS (laboratory culture on a blood agar plate during ≥ 48 hr)?	Yes, No or Unclear
Were the culture medium, atmosphere, duration of incubation and GAS‐confirmation technique described?	Yes, No or Unclear
Could the throat culture, its conduct or its interpretation have introduced bias?	Risk: Low, High or Unclear
Is there concern that the target condition as defined by the reference standard does not match the review question?	Concern: Low, High or Unclear
Domain 4: Flow and timing
Was the delay between the performance of the RADT and throat culture plating ≤ 48 hours?	Yes, No or Unclear
Did all patients receive a throat culture?	Yes, No or Unclear
Did patients receive the same throat culture method?	Yes, No or Unclear
Were undetermined/uninterpretable results reported?	Yes, No or Unclear
Were withdrawals from the study explained?	Yes, No or Unclear
Could the patient flow have introduced bias?	Risk: Low, High or Unclear

Statistical analysis and data synthesis

We entered data for the 2 x 2 tables into RevMan 2014 and plotted estimates of sensitivity and specificity on forest plots and in the receiver‐operating characteristic (ROC) space to represent the variability in diagnostic test accuracy within and between studies.

We fitted the hierarchical bivariate model described by Reitsma 2005 by use of Stata/SE version 13 (using the user written program 'metandi'), which allowed for calculating summary estimates of sensitivity and specificity and the associated 95% confidence intervals (CIs). We also reported the estimate of correlation between sensitivity and specificity (rho). We put the results from the bivariate model into RevMan 2014 to provide plots of the estimated summary points and confidence regions, superimposed on the study‐specific estimates of sensitivity and specificity in the ROC space.

We included the same study in the same meta‐analysis more than once if needed, i.e., if one study reported different index tests. We presented results in groups according to commercial test name.

Investigations of heterogeneity

We initially visually inspected the forest plots and ROC space to check for heterogeneity between study results. To investigate sources of heterogeneity, we incorporated covariates in the bivariate model, i.e., meta‐regression (using the built in program 'xtmelogit' and routines available at http://methods.cochrane.org/sdt/software‐meta‐analysis‐dta‐studies). We assessed the significance of the difference in covariate by likelihood ratio test comparing the bivariate model with and without the covariate. We used a P value of less than 0.05 to denote statistical significance. With a significant test result, we assessed effects of covariates on sensitivity and specificity separately by testing the significance of the change in ‐2 log‐likelihood of the model (i.e., change in model deviance) with or without corresponding terms. We addressed the five following sources of heterogeneity by adding variables to the meta‐analysis model:

a. Effect of test type

Some authors have suggested that OIA may be more sensitive than EIA tests (Gerber 2004). Therefore, we tried to indirectly compare the RADT tests by using test type as a categorical covariate in the models (EIA versus OIA); in indirect comparisons, data originate from different studies in which participants underwent either the EIA or the OIA test. We also tried to perform direct comparisons of EIA versus OIA by restricting the analysis to studies in which all patients underwent both EIA and OIA tests.

b. Effect of the reference standard

In this review, the reference standard was throat culture on a blood agar plate. However, several parameters may affect the accuracy of throat culture on blood agar, including whether an enrichment broth was used before plating. We added this variable as a categorical covariate (yes/no) in the model.

c. Effect of age

The sensitivity of RADTs is known to be higher in younger children than in older ones (Cohen 2012; Edmonson 2005). This might be explained by higher GAS prevalence in school‐age children with pharyngitis than in older children. Therefore, we explored age as a potential source of heterogeneity by using the mean age of patients in the study as a categorised covariate in the model (i.e., below or above median of mean age across studies).

d. Effect of disease severity

Spectrum effect has been demonstrated for RADTs, with increasing sensitivity with increasing disease severity, usually assessed by the McIsaac score (Cohen 2012; Edmonson 2005; Hall 2004; Tanz 2009). Therefore, disease severity might be a relevant source of heterogeneity to explore. We used the proportion of patients with a McIsaac score greater than two as a categorical covariate in the model; we compared studies with less than 70% of patients with a McIsaac score greater than two to studies with more than 70% of patients with a McIsaac score greater than two (arbitrary).

e. Effect of GAS prevalence

Diagnostic accuracy may vary with disease prevalence (Leeflang 2009; Leeflang 2013), usually with better performances in a population with higher disease prevalence. We considered GAS prevalence as a dichotomised covariate to define low‐risk versus high‐risk study populations (i.e., below or above median of GAS prevalence across studies).

Sensitivity analyses

We carried out the following sensitivity analyses to explore the robustness of the results:

include only studies judged at low risk of bias in each QUADAS‐2 domain;
include only studies judged at low risk of bias in at least 3 of 4 QUADAS‐2 domains (arbitrary);
include only studies judged to have low concerns about applicability in each QUADAS‐2 domain.

Additional analyses

We performed univariate logitnormal random‐effects meta‐analysis of the negative predictive value of RADTs (using the user written command ‘metan’) combining studies with complete verification and studies in which RADT results were selectively verified by throat culture only in RADT‐negative participants.

Assessment of reporting bias

We did not try to assess reporting bias (Macaskill 2010).

Results

Results of the search

The electronic search was performed on 7 July 2015. The search identified 5576 titles, of which we identified 82 as duplicates. We further excluded a total of 5166 titles on the basis of their title, abstract or both (Figure 1). After assessment of the full text of 328 articles, we excluded 235. Using the 'PubMed related articles' function and Google Scholar, and checking the references of included studies or reviews on the same topic (Gerber 2004; Lean 2014; Ruiz‐Aragon 2010; Stewart 2014), allowed us to include five additional studies (Nitsch‐Osuch 2010; Pauchard 2012; Sedki 2010; Tellechea 2012; Wong 1989). When possible, we contacted by email and postal mail authors of studies that included children and adults or in which the age of participants was unclear; eight trial authors shared or clarified paediatric data (Arribas Blanco 1988; Drulak 1991; Llor 2008; Mezghani Maleej 2010; Mlejnek 2014; Pauchard 2012; Pauchard 2013; Schwabe 1987; Schwabe 1991; Toepfner 2013). All included studies were cross‐sectional. Manufacturers of RADTs did not respond. Thus, this review includes a total of 98 unique study reports.

Figure 1

Flow diagram of studies in the review. *Studies awaiting classification (n = 14)

Included studies

Some studies were subdivided for the purpose of the review. One multi‐centre study conducted in four different countries was subdivided into four study cohorts (Rimoin 2010a). Some studies were also subdivided because they evaluated more than one RADT: nine studies compared two tests (Donatelli 1992a; Egger 1990a; Gieseker 2002a; Kaufhold 1991a; Mayes 2001a; Mirza 2007a; Roe 1995a; Schwartz 1997a; Wright 2007a), one compared three tests (Rogo 2010a), and one compared five tests (Chiadmi 2004a). Thus, this review includes a total of 116 test evaluations reporting a total of 101,121 test results. We performed descriptive analysis, methodological quality assessment and meta‐analysis at the test evaluation level.

Included studies came from a variety of countries (n = 25); 53 (46%) test evaluations were conducted in the US. Forty‐two different commercial RADT kits were evaluated, and three studies mentioned evaluating an EIA test without providing any commercial name (further referred to as “EIA (no name)”). Six commercial kits were evaluated in at least five paediatric cohorts: OSOM Strep A, QuickVue InLine Strep A, Strep A OIA, Strep A OIA Max, TestPack Strep A and TestPack Plus.

Excluded studies

Amongst 328 full‐text articles assessed, we excluded 235 trials. Thirty‐five assessed RADTs relying on other technologies than EIA or OIA. We excluded 38 studies because they included children and adults but did not report specific data for children, and we could not obtain additional data by contacting the trial authors. The status of 10 studies is uncertain because we were unable to obtain articles in full text. The status of four articles is uncertain as they have not yet been translated (two articles in Turkish, one in Polish and one in Czech).

Methodological quality of included studies

The overall methodological quality of included study cohorts is summarised in Figure 2. The quality assessment results for the individual studies is shown in Figure 3. The median sample size per study cohort was 297 participants (interquartile range (IQR) 196 to 539). The median mean age of participants was 6.6 (IQR 5.8 to 7.7) years, as reported in 32 studies. The majority of study cohorts (82 of 116, 71%) did not clearly report whether participants formed a consecutive, random or convenience series. Fifty‐eight study cohorts (50%) avoided clinical selection of participants and therefore included a representative spectrum of patients.

Figure 2

Risk of bias and applicability concerns graph: review authors' judgements about each domain across all included study cohorts (n = 116).

Figure 3

Risk of bias and applicability concerns summary: review authors' judgements about each domain for each included study cohort (n = 116).

Interpretation of the results of the RADT was done with blinding of the result of throat culture in 84 of 116 cases (72%). An appropriate reference standard (i.e., laboratory throat culture on a blood agar plate during 48 hours) was used in 72 study cohorts (62%). Interpretation of the results of the reference standard was done with blinding of the result of the RADT in 23 of 116 cases (20%).

Partial verification was avoided in a majority (105 of 116, 91%) of cases. In 10 study cohorts (42,319 participants), RADT results were verified by throat culture only in RADT negative participants (Ayanruoh 2009; Cohen 2004; Edmonson 2005; Hall 2004; Mayes 2001a; Mirza 2007a; Mlejnek 2014; Van Limbergen 2006); in one study (558 participants) RADT results were verified only in RADT positive participants (Cohen 1998).

Findings

Across the 116 study cohorts included in the review, the sensitivity of rapid antigen detection tests (RADTs) ranged from 38.6% to 100% and the specificity from 54.1% to 100% (Figure 4). We excluded 11 study cohorts from the meta‐analysis of sensitivity and specificity estimates for a final dataset containing 105 pairs of sensitivity and specificity (58,244 participants), where partial verification was not avoided.

Figure 4

Forest plots of RADT sensitivity and specificity for GAS detection, ordered by commercial kit. TP = True Positive; FP = False Positive; FN = False Negative; TN = True Negative.

Summary estimates of sensitivity and specificity

Amongst 105 test evaluations included in the meta‐analysis (58,244 participants), the summary estimates of sensitivity and specificity were 85.6%; 95% confidence interval (CI) 83.3 to 87.6; and 95.4%; 95% CI 94.5 to 96.2, respectively (Figure 5). There was no statistical evidence of a correlation between sensitivity and specificity (correlation coefficient ‐0.17; 95% CI ‐0.39 to 0.07).

Figure 5

Summary ROC plot of RADT sensitivity and specificity for GAS detection (n = 105). Each individual study cohort is represented by an empty circle. The filled circle is the pooled summary estimate for sensitivity and specificity. The solid curve represents the 95% confidence region around the summary estimate; the dashed curve represents the 95% prediction region.

Enzyme immunoassay (EIA) tests

We included 86 evaluations of EIA RADTs (48,808 participants). The median sample size was 263 (IQR 178 to 454) and the median prevalence of group A streptococcus (GAS) on throat culture was 29.5% (IQR 23.8% to 34.9%). Sensitivity of EIA RADTs ranged from 38.6% to 100%, and specificity from 54.1% to 100%. The summary estimates of sensitivity and specificity for EIA tests were 85.4% (82.7 to 87.8) and 95.8% (94.8 to 96.6), respectively (Figure 6).

Figure 6

Summary ROC plot of RADT sensitivity and specificity for GAS detection: EIA (n = 86) versus OIA (n = 19). The filled black circle is the pooled summary estimate for sensitivity and specificity of EIA tests; the filled red circle is the pooled summary estimate for sensitivity and specificity of OIA tests The solid curves represent the 95% confidence region around the summary estimate; the dashed curves represent the 95% prediction region.

Optical immunoassay (OIA) tests

We included 19 evaluations of OIA RADTs (9436 participants). The median sample size was 302 (IQR 233 to 519), and the median prevalence of GAS on throat culture was 29.5% (IQR 23.7% to 36.4%). Sensitivity of OIA RADTs ranged from 72.4% to 96.7%, specificity from 61.0% to 97.1%. The summary estimates of sensitivity and specificity for OIA tests were 86.2% (82.7 to 89.2) and 93.7% (91.5 to 95.4), respectively (Figure 6).

Investigations of heterogeneity

Visual inspection of the forests plots and ROC space suggested substantial heterogeneity in accuracy estimates, especially amongst estimates of sensitivity, as reflected by the wide prediction areas around summary estimates. The results of investigations of heterogeneity are summarised in Table 3.

Table 3. Results of investigations of heterogeneity

Study‐level covariate		Studies (n)	Sensitivity (95% CI)	Specificity (95% CI)	Interpretation
Test type^a
	Enzyme immuno‐assay	86	85.4 (82.7 to 87.8)	95.8 (94.8 to 96.6)	Accuracy does not seem influenced by test type (P value = 0.23)
	Optical immuno‐assay	19	86.2 (82.7 to 89.2)	93.7 (91.5 to 95.4)
Throat culture
	Without enrichment broth	88	85.5 (82.8 to 87.8)	95.6 (94.8 to 96.3)	Accuracy does not seem influenced by whether an enrichment broth was used (P value = 0.15)
	With enrichment broth	10	86.3 (83.3 to 88.7)	92.7 (87.9 to 95.7)
Mean age of participants^b
	Below the median	16	87.1 (81.7 to 91.1)	93.2 (90.5 to 95.2)	No evidence of association with age (P value = 0.39)
	Above the median	13	83.7 (78.5 to 87.9)	95.0 (92.7 to 96.6)	No evidence of association with age (P value = 0.39)
% of patients with McIsaac score > 2
	≤ 70%	4	81.3 (69.8 to 89.1)	94.9 (91.1 to 97.2)	No evidence of association with clinical severity (P value = 0.35)
	> 70%	8	88.8 (82.9 to 92.9)	94.2 (89.4 to 96.9)
Prevalence of group A streptococcus^c
	Below the median	54	84.9 (81.1 to 88.1)	95.5 (94.2 to 96.4)	Accuracy does not seem influenced by the prevalence of group A streptococcus (P value = 0.70)
	Above the median	51	86.2 (83.5 to 88.5)	95.4 (94.0 to 96.5)

^aResults based on indirect comparisons; ^bthe median of mean age was 6.6 years; ^cthe median of group A streptococcus prevalence using throat culture as the reference standard was 29.5%.

CI: confidence interval

a. Effect of test type

There were 86 evaluations of EIA tests (48,808 participants) and 19 evaluations of OIA tests (9436 participants). Based on analysis of all available data, there was no statistical evidence that sensitivity and/or specificity differed between EIA and OIA tests (sensitivity 85.4% versus 86.2%, respectively; specificity 95.8% versus 93.7%, respectively; change in model deviance = 2.90; P value = 0.23) (Figure 6).

Two studies directly compared EIA to OIA tests by applying both tests to each individual (802 participants; Figure 7) (Gieseker 2002a; Roe 1995a); data were too limited to perform additional statistical analysis. In Gieseker 2002a, EIA and OIA tests had comparable specificity (92% (87 to 95) versus 95% (91 to 97), respectively), and the EIA test had the highest sensitivity (97% (90 to 99) versus 79% (69 to 87), respectively). Contrarily, Roe 1995a found that EIA and OIA tests had comparable sensitivity (82% (75 to 88) versus 83% (77 to 89), respectively), with the specificity of EIA being higher than that of the OIA test under evaluation (96% (93 to 98) versus 89% (85 to 92)).

Figure 7

Summary ROC plot of RADT sensitivity and specificity for GAS detection: direct comparison of EIA versus OIA (n = 2). Each individual study cohort is represented by an empty black circle (EIA) and an empty red diamond (OIA), connected by a dotted line.

b. Effect of the reference standard

An enrichment broth was used before plating in 10 test evaluations; this was not done in 88 study cohorts, and the information was unclear for seven. Using an enrichment broth before plating was not associated with significantly different estimates of sensitivity and/or specificity (sensitivity 86.3% versus 85.5%, respectively; specificity 92.7% versus 95.6%, respectively; change in model deviance = 3.79; P value = 0.15).

c. Effect of age

Twenty‐nine studies reported the mean age of participants. The median of the mean age of participants was 6.6 years (IQR 5.8 to 7.4). Mean age was not associated with significantly different estimates of sensitivity and/or specificity (sensitivity 87.1% versus 83.7%, respectively; specificity 93.2% versus 95.0%, respectively; change in model deviance = 1.87; P value = 0.39).

d. Effect of disease severity

Twelve studies assessed clinical severity using the McIsaac score. The median proportion of severe patients (patients with a McIsaac score greater than two) was 85% (IQR 63% to 91%). The proportion of severe patients was below 70% in four study cohorts. Meta‐regression did not show evidence of significant associations between clinical severity and sensitivity and/or specificity (change in model deviance = 2.10; P value = 0.35).

e. Effect of GAS prevalence

Based on the proportion of throat culture results positive for GAS, the median prevalence of participants with streptococcal pharyngitis was 29.5% (IQR 23.8% to 34.9%). There was no significant effect of GAS prevalence on sensitivity and/or specificity when GAS prevalence was tested as a covariate in the bivariate model (change in model deviance = 0.71; P value = 0.70).

Sensitivity analysis

Compared with the overall results (summary sensitivity 85.6%), sensitivity was lower in the 20 studies at low risk of bias for the reference standard (81.0%), higher in the 33 studies with low concerns about applicability in the index test domain (89.1%), but stable in the 20 studies at low risk of bias in at least three QUADAS‐2 domains (84.0%) (Table 4). Summary estimates of specificity were robust across subgroups, at around 95%.

Table 4. Results of sensitivity analyses

Concerns	Domain	Studies at low risk (n)	Sensitivity (95% CI)	Specificity (95% CI)
Risk of bias	Patient selection	25	85.7 (82.1 to 88.6)	93.0 (91.1 to 94.5)
	Index test	65	86.6 (84.0 to 88.8)	95.2 (94.1 to 96.1)
	Reference standard	20	81.0 (74.1 to 86.5)	95.5 (93.4 to 96.9)
	Flow and timing	98	85.4 (83.0 to 87.5)	95.3 (94.4 to 96.1)
	≥ 3 domains with low risk of bias	20	84.0 (79.4 to 87.8)	95.0 (93.1 to 96.4)
Applicability
	Patient selection	41	83.1 (79.7 to 86.0)	94.9 (93.4 to 96.0)
	Index test	33	89.1 (85.7 to 91.8)	95.0 (93.2 to 96.4)
	Reference standard	60	84.9 (81.6 to 87.6)	94.7 (93.5 to 95.7)

CI: confidence interval

Additional analysis

We excluded 10 studies from the main meta‐analysis of sensitivity and specificity estimates because RADT results were selectively verified by throat culture only in RADT negative participants (partial verification); four were very large studies (more than 3000 participants) (Ayanruoh 2009; Mayes 2001a; Mirza 2007a; Mlejnek 2014). We performed a meta‐analysis of the negative predictive value of RADTs, including those 10 additional studies. Across 115 test evaluations, the median prevalence of participants with streptococcal pharyngitis was 29.4%. Negative predictive value ranged from 70.2% to 100%; the summary estimate of negative predictive value was 93.9% (93.1 to 94.6).

Discussion

Summary of main results

In this systematic review, we included 116 cohorts (98 unique studies; 101,121 participants) that evaluated rapid antigen detection tests (RADTs) for the detection of group A streptococcus (GAS) in children with pharyngitis. The overall methodological quality of included studies was poor. Across 105 study cohorts (58,244 participants) in which all participants underwent both RADT and throat culture, the summary estimates of sensitivity and specificity were 85.6% (83.3 to 87.6) and 95.4% (94.5 to 96.2), respectively. There were substantial variations in sensitivity across studies, but specificity was more stable; there was no statistical evidence of a trade‐off between sensitivity and specificity. Heterogeneity in accuracy was not explained by study‐level characteristics such as test type (enzyme immunoassay (EIA) versus optical immunoassay (OIA)), use of an enrichment broth before plating, mean age and clinical severity of participants, and GAS prevalence. Summary estimates of sensitivity and specificity were stable in low risk of bias studies (84.0% and 95.0%, respectively). Across 115 test evaluations in which all negative RADT results were verified by throat culture, the negative predictive value of RADT was 93.9% (93.1 to 94.6).

Summary of findings

The summary of findings Table summarises the findings of the review by applying the results to a hypothetical cohort of 1000 children with pharyngitis, considering three scenarios where GAS prevalence varies from 20% to 40%. The consequence of a false negative result is that the patient may not receive antibiotic treatment, and thus may experience symptoms for a longer period and be at higher risk of developing non‐suppurative and suppurative complications of GAS infection (Spinks 2013). The consequence of a false positive result is that the patient may receive unnecessary antibiotics, which could result in adverse reactions and unwilling exposure to antibiotic‐resistant bacteria.

Comparison with previous findings

Our findings are in line with those from three published systematic reviews about the accuracy of RADTs for the diagnosis of streptococcal pharyngitis (Table 5) (Lean 2014; Ruiz‐Aragon 2010; Stewart 2014). Summary estimates of sensitivity and specificity were comparable across reviews, at around 85% and 95%, respectively.

Table 5. Comparison between previous systematic reviews on the diagnostic accuracy of RADTs for streptococcal pharyngitis and the present one

	Ruiz‐Aragon 2010^a	Lean 2014	Stewart 2014	Present review
Study participants	Adults and children	Adults and children	Adults and children	Children
Timeframe for searches	2000 to 2009	1996 to 2013	2000 to 2012	1980 to 2015
Number of studies included	24	60^b	58^c	105^b
Number of participants included	14,936	29,934	55,766	58,244
Summary estimate of sensitivity (95% CI)	85% (84 to 87)	86% (83 to 88)	84% (83 to 85)^d	86% (83 to 88)
Summary estimate of specificity (95% CI)	96% (96 to 97)	96% (94 to 97)	95% (94 to 95)^d	95% (95 to 96)
Investigations of heterogeneity	None performed	No evidence of significant variation in accuracy by test type (EIA versus OIA), and by age (children versus adults)	Did not identify sources of variability^d	Did not identify sources of variability

^aIn Spanish; ^bpairs of sensitivity and specificity; ^c59 study cohorts; ^damongst high‐quality studies.

CI: confidence interval

Strengths and weaknesses of the review

We believe this dataset constitutes a fair representation of diagnostic accuracy studies evaluating RADTs in children with pharyngitis. However, it is known that studies of diagnostic test accuracy tend to be poorly indexed in electronic databases and we may therefore have missed some eligible studies. Moreover, we used an extensive literature search but we did not look systematically in conference abstracts, whereas it has been estimated that at least one‐fourth of abstracts of diagnostic accuracy studies presented at conferences are not published (Brazzelli 2009). Thirty‐eight studies did not differentiate between adults and children and so whilst they were identified, eligible subsets of data could not be included in the review.

The overall methodological quality of studies included in the review was poor, with less than one‐fifth (17%) of studies being judged at low risk of bias for at least three of four QUADAS‐2 domains, and half (50%) of estimates of diagnostic accuracy obtained from unselected groups of children presenting with signs and symptoms of pharyngitis. Poor quality mainly arose from high risk of selection bias and high risk of bias in the reference standard used (in 73% and 43% of test evaluations, respectively). Poor study reporting frequently impeded quality appraisal. Whether or not participants formed a consecutive or random series was reported in only 29% of cases, inclusion criteria in 46%, and whether readers of the reference standard were blinded to the result of the rapid test in 28%. We used QUADAS‐2 to assess the quality of included studies but did not use GRADE to rate the overall quality of the body of evidence; we will undertake GRADE assessment in future updates of this review.

We included sufficient numbers of studies and participants to obtain precise summary estimates. However, we were not able to identify sources of heterogeneity in accuracy through meta‐regression. It is known that sensitivity of RADTs is likely to vary across patient subgroups within a study; several studies, for example, found evidence of increasing sensitivity with increasing Centor or McIsaac scores (Cohen 2012; Edmonson 2005; Hall 2004; Tanz 2009). Due to aggregation bias, relationships across studies may not reflect relationships within studies; the relationship between accuracy and patient characteristics such as age and disease severity may be adequately estimated only using individual patient data; we strongly recommend such a future work. We dichotomised variables such as age and clinical severity when investigating heterogeneity, mostly because we lack routines for bivariate meta‐regression with continuous variables in Stata, but this may be at the cost of loss of information and statistical power. Study setting could also be a relevant source of heterogeneity to explore in future trials.

Other well described sources of variability in RADT sensitivity could not be explored in this review. For example, several studies reported increasing sensitivity with increasing amount of GAS found on culture (Cohen 2012; Kuhn 1999; Kurtz 2000), but we could not evaluate and compare such effects across studies because of the absence of any standard method to measure bacterial inoculum size. Also the level of expertise of the person performing the throat sample seems to affect the sensitivity of RADTs; several studies have shown improvement in sensitivity following dedicated training sessions (Fox 2006 ; Toepfner 2013).

The analysis was carried out at the test evaluation level, therefore some studies were included more than once in the meta‐analysis. This means that the summary estimates are partially based on duplicate use of individuals. This is likely to have introduced bias. However, we anticipate that the implications are rather marginal because such studies represent only a minority when compared to the total number of included studies (11 out of 98).

Applicability of findings to the review question

Included studies came from a variety of countries (n = 25) and ambulatory care settings (private offices, walk‐in clinics, emergency departments). However, only half of studies avoided clinical selection of participants; investigators often used clinical criteria, such as McIsaac’s, as inclusion criteria. Thus, the included studies may provide a distorted reflection of the diagnostic performance of RADT in unselected children with pharyngitis seen in ambulatory care. From the 41 studies judged at low risk of applicability concerns for patient selection, the summary estimate of sensitivity was slightly lower than the overall estimate (83.1% versus 85.6%, respectively).

We evaluated 42 different commercial kits in this review. All of them are binary tests giving either a positive or negative result, but the different commercial kits may not share a common positivity threshold (Charlier‐Bret 2004; Lasseter 2009). The absence of evidence for a significant correlation between sensitivity and specificity suggests that threshold effects may be negligible when evaluating the accuracy of RADTs. Recently, molecular rapid tests relying on DNA probes, polymerase chain reaction (PCR) and fluorescence in situ methods have been commercialised (Chapin 2002; Ding 2011; Slinger 2011). Their accuracy seems promising but they have rarely been evaluated in children and require specialised equipment and personnel.

Amongst 105 test evaluations included in the meta‐analysis of sensitivity and specificity estimates, we judged about one‐third (31%) to be of low concern regarding applicability in the index test domain because the RADT was processed and interpreted during consultation time. In this subgroup of studies, the summary estimate of sensitivity was higher than that from the overall analysis (89.1% versus 85.6%, respectively).

An appropriate reference standard (laboratory throat culture on a blood agar plate during 48 hours) was used in about two‐thirds (62%) of test evaluations. An enrichment broth was used to improve recovery of GAS on culture in 10% of test evaluations; this did not have any effect on RADT sensitivity on meta‐regression.

Figure 1

Flow diagram of studies in the review. *Studies awaiting classification (n = 14)

Figure 2

Risk of bias and applicability concerns graph: review authors' judgements about each domain across all included study cohorts (n = 116).

Figure 3

Risk of bias and applicability concerns summary: review authors' judgements about each domain for each included study cohort (n = 116).

Figure 4

Forest plots of RADT sensitivity and specificity for GAS detection, ordered by commercial kit. TP = True Positive; FP = False Positive; FN = False Negative; TN = True Negative.

Figure 5

Figure 6

Figure 7

Test 1

All studies (n = 116).

Test 2

Complete verification (n = 105).

Test 3

EIA (direct comparison).

Test 4

OIA (direct comparison).

Test 5

Acceava Strep A (Biostar).

Test 6

ACON Strep A Rapid Test Strip.

Test 7

BioNexia Strep A (BioMerieux).

Test 8

CARDS QS Strep A (Quidel).

Test 9

Clearview Exact Strep A.

Test 10

Clearview Strep A.

Test 11

Diaquick Strep A Test (Dialab).

Test 12

Directgen 1‐2‐3 Group A Strep (Becton Dickinson).

Test 13

Direct Strep A EIA.

Test 14

EIA (no name).

Test 15

Group A Strep Test (Quidel).

Test 16

IM Strep A (International Microbio).

Test 17

Meridian Bioscience.

Test 18

OSOM Strep A (Genzyme).

Test 19

OSOM Ultra Strep A (Genzyme).

Test 20

QuickVue Dipstick Strep A (Quidel).

Test 21

QuickVue Flex Strep A (Quidel).

Test 22

QuickVue In‐Line Strep A (Quidel).

Test 23

QuickVue+ Strep A (Quidel).

Test 24

Sacks Biological Farms.

Test 25

SD Bioline Strep A.

Test 26

Signify Strep A (Abbott).

Test 27

SMART Group A Strep (New Horizons).

Test 28

Strep A Abon kit.

Test 29

Strep A OIA (Biostar).

Test 30

Strep A OIA Max (Biostar).

Test 31

Strep A Rapid Test Device.

Test 32

Strep A Sign.

Test 33

Strep A test II (INTEX Diagnostica).

Test 34

StreptAtest (Dectrapharm).

Test 35

Streptavit.

Test 36

Streptop A (ALL‐Diag).

Test 37

SUDS Group A Strep.

Test 38

SureScreen Test Strep A.

Test 39

TestPack Strep A (Abbott).

Test 40

TestPack Plus (Abbott).

Test 41

TestPack Plus Strep A with OBC II (Abbott).

Test 42

Ventrescreen Strep A (Ventrex Lab).

Test 43

Visuwell Strep A (ADI).

Test 44

Icon Strep A.

Test 45

Qtest (Becton Dickinson).

Test 46

Link 2 Strep A Rapid Test (Becton Dickinson).

Test 47

Event Test Strip Strep A.

Summary of findings Summary of findings table

Review questions	What is the diagnostic accuracy of rapid antigen detection tests (RADT) for detecting group A streptococcus (GAS)? What is the relative diagnostic accuracy of the two major types of RADTs (enzyme immunoassays (EIA) and optical immunoassays (OIA))?
Patients/population	Children with acute pharyngitis
Prior testing	Physical examination establishing the diagnosis of pharyngitis, with or without evaluating the likelihood of a streptococcal origin
Settings	Ambulatory care settings: mainly private offices, emergency departments and walk‐in clinics
Index tests	EIA and OIA test for GAS
Reference standard	Throat culture on a blood agar plate
Importance	Compared with culture, RADTs offer diagnosis at the point of care. Whether negative RADTs should be backed up by throat culture depends mainly on the reported sensitivity of the test
Studies	Cross‐sectional studies
Quality concerns	Methodological quality was generally poor, but quality appraisal was impeded by suboptimal reporting. Patient selection and reference standard methods were common risk of bias concerns (in 73% and 43% of test evaluations, respectively)
Heterogeneity	There was substantial heterogeneity in the results of the individual studies, especially for sensitivity, which could not be explained by the investigations
	Quantity of evidence		Average diagnostic accuracy		Consequences in a cohort of 1000 patients…
	Studies (n)	Participants (n)	Sensitivity (95% CI)	Specificity (95% CI)	…given 20% prevalence of GAS cases?	…given 30% prevalence of GAS cases?	…given 40% prevalence of GAS cases?
RADT for the diagnosis of GAS pharyngitis in children (EIA and OIA tests)	105	58,244	85.6% (83.3 to 87.6)	95.4% (94.5 to 96.2)	200 children will have a positive culture for GAS. Of these, 171 will be identified (TP); 29 will be missed (FN). Of the 800 children without GAS, 763 will not be treated (TN); 37 may receive unnecessary antibiotics (FP)	300 children will have a positive culture for GAS. Of these, 257 will be identified (TP); 43 will be missed (FN). Of the 700 children without GAS, 668 will not be treated (TN); 32 may receive unnecessary antibiotics (FP)	400 children will have a positive culture for GAS. Of these, 342 will be identified (TP); 58 will be missed (FN). Of the 600 children without GAS, 572 will not be treated (TN); 28 may receive unnecessary antibiotics (FP)
Comparison of EIA versus OIA tests
EIA tests	86	48,808	85.4% (82.7 to 87.8)	95.8% (94.8 to 96.6)	Interpretation: EIA and OIA tests seem to have comparable accuracy (P value = 0.23)
OIA tests	19	9436	86.2% (82.7 to 89.2)	93.7% (91.5 to 95.4)
CI: confidence interval EIA: enzyme immunoassay FN: false negative FP: false positive GAS: group A streptococcus OIA: optical immunoassay RADT: rapid antigen detection test TN: true negative TP: true positive

Summary of findings Summary of findings table

Table 1. Data extracted from each study

Study ID	First author, year of publication
Type of study	Journal article or conference abstract
Clinical features and settings	Presenting signs and symptoms
	Clinical selection of patients (none, clinical score, explicit criteria but not a score, implicit criteria)
	Exclusion if antibiotics use before inclusion (yes/no)
	Clinical setting (office‐based, emergency department, walk‐in clinic, mixed, other)
	Single‐ or multi‐centre study
	Age range for inclusion
Participants	Sample size (n)
	Age (distribution)
	GAS prevalence according to culture (with 95% confidence interval)
	Country of study
	Sex (% of girls)
	Clinical severity assessment (Centor score, McIsaac score, other, none)
Study design	Cross‐sectional study or RCT
	Retrospective or prospective design
	Sample (consecutive, random or unclear)
	Direct comparison of different RADTs (yes/no)
	Direct comparison of several throat culture techniques (yes/no)
	Throat swab (1 single, 1 double, 2 different)
	Person performing the throat sample (physician, nurse, laboratory personnel, other)
Reference standard(s)	Throat culture medium (standard, enrichment, inhibitory)
	Atmosphere of incubation (aerobic, aerobic with CO₂ enrichment, anaerobic)
	Duration of incubation (≤ 24, 24 to 48, ≥ 48 hours)
	GAS confirmation (bacitracin disk, latex test, other, none)
	Number of plates inoculated (n)
	Assessment of GAS antibody response (yes/no)
	Relevant details
Index tests	Commercial name of the RADT
Index tests	Type of RADT (EIA, OIA)
Data	Number of true positives, false positives, true negatives, false negatives and undetermined/uninterpretable results
Notes	Source of funding (whether any of the authors is affiliated with the manufacturer of the RADT, the study was directly funded by the manufacturer, authors reported conflicts of interests related to the manufacturer or other funding sources)
Notes	Anything else of relevance
RADT: rapid antigen detection test EIA: enzyme immunoassay OIA: optical immunoassay CO₂: carbon dioxide

Table 1. Data extracted from each study

Table 2. Methodological quality assessment table for each study

Domain 1: Patient selection
Was a consecutive or random sample of patients enrolled?	Yes, No or Unclear
Was it a cross‐sectional study or a RCT?	Yes, No or Unclear
Were selection criteria clearly described (at least presenting signs and symptoms and age limits for inclusion)?	Yes, No or Unclear
Were patients seen in an ambulatory care setting?	Yes, No or Unclear
Was clinical selection of patients avoided?	Yes, No or Unclear
Could the selection of patients have introduced bias?	Risk: Low, High or Unclear
Is there concern that the included patients do not match the review question?	Concern: Low, High or Unclear
Domain 2: RADT (index test)
Were RADTs conducted during consultation time?	Yes, No or Unclear
Were the RADT results interpreted with blinding of the results of culture?	Yes, No or Unclear
Was the type of the RADT mentioned (EIA or OIA)?	Yes, No or Unclear
Could the conduct or interpretation of the RADT have introduced bias?	Risk: Low, High or Unclear
Is there concern that the RADT, its conduct or interpretation differ from the review question?	Concern: Low, High or Unclear
Domain 3: Throat culture (reference standard)
Were culture results interpreted with blinding of the results of the RADT?	Yes, No or Unclear
Is the throat culture method likely to correctly identify GAS (laboratory culture on a blood agar plate during ≥ 48 hr)?	Yes, No or Unclear
Were the culture medium, atmosphere, duration of incubation and GAS‐confirmation technique described?	Yes, No or Unclear
Could the throat culture, its conduct or its interpretation have introduced bias?	Risk: Low, High or Unclear
Is there concern that the target condition as defined by the reference standard does not match the review question?	Concern: Low, High or Unclear
Domain 4: Flow and timing
Was the delay between the performance of the RADT and throat culture plating ≤ 48 hours?	Yes, No or Unclear
Did all patients receive a throat culture?	Yes, No or Unclear
Did patients receive the same throat culture method?	Yes, No or Unclear
Were undetermined/uninterpretable results reported?	Yes, No or Unclear
Were withdrawals from the study explained?	Yes, No or Unclear
Could the patient flow have introduced bias?	Risk: Low, High or Unclear

Table 2. Methodological quality assessment table for each study

Table 3. Results of investigations of heterogeneity

Study‐level covariate		Studies (n)	Sensitivity (95% CI)	Specificity (95% CI)	Interpretation
Test type^a
	Enzyme immuno‐assay	86	85.4 (82.7 to 87.8)	95.8 (94.8 to 96.6)	Accuracy does not seem influenced by test type (P value = 0.23)
	Optical immuno‐assay	19	86.2 (82.7 to 89.2)	93.7 (91.5 to 95.4)
Throat culture
	Without enrichment broth	88	85.5 (82.8 to 87.8)	95.6 (94.8 to 96.3)	Accuracy does not seem influenced by whether an enrichment broth was used (P value = 0.15)
	With enrichment broth	10	86.3 (83.3 to 88.7)	92.7 (87.9 to 95.7)
Mean age of participants^b
	Below the median	16	87.1 (81.7 to 91.1)	93.2 (90.5 to 95.2)	No evidence of association with age (P value = 0.39)
	Above the median	13	83.7 (78.5 to 87.9)	95.0 (92.7 to 96.6)	No evidence of association with age (P value = 0.39)
% of patients with McIsaac score > 2
	≤ 70%	4	81.3 (69.8 to 89.1)	94.9 (91.1 to 97.2)	No evidence of association with clinical severity (P value = 0.35)
	> 70%	8	88.8 (82.9 to 92.9)	94.2 (89.4 to 96.9)
Prevalence of group A streptococcus^c
	Below the median	54	84.9 (81.1 to 88.1)	95.5 (94.2 to 96.4)	Accuracy does not seem influenced by the prevalence of group A streptococcus (P value = 0.70)
	Above the median	51	86.2 (83.5 to 88.5)	95.4 (94.0 to 96.5)
^aResults based on indirect comparisons; ^bthe median of mean age was 6.6 years; ^cthe median of group A streptococcus prevalence using throat culture as the reference standard was 29.5%. CI: confidence interval

Table 3. Results of investigations of heterogeneity

Table 4. Results of sensitivity analyses

Concerns	Domain	Studies at low risk (n)	Sensitivity (95% CI)	Specificity (95% CI)
Risk of bias	Patient selection	25	85.7 (82.1 to 88.6)	93.0 (91.1 to 94.5)
	Index test	65	86.6 (84.0 to 88.8)	95.2 (94.1 to 96.1)
	Reference standard	20	81.0 (74.1 to 86.5)	95.5 (93.4 to 96.9)
	Flow and timing	98	85.4 (83.0 to 87.5)	95.3 (94.4 to 96.1)
	≥ 3 domains with low risk of bias	20	84.0 (79.4 to 87.8)	95.0 (93.1 to 96.4)
Applicability
	Patient selection	41	83.1 (79.7 to 86.0)	94.9 (93.4 to 96.0)
	Index test	33	89.1 (85.7 to 91.8)	95.0 (93.2 to 96.4)
	Reference standard	60	84.9 (81.6 to 87.6)	94.7 (93.5 to 95.7)
CI: confidence interval

Table 4. Results of sensitivity analyses

Table 5. Comparison between previous systematic reviews on the diagnostic accuracy of RADTs for streptococcal pharyngitis and the present one

	Ruiz‐Aragon 2010^a	Lean 2014	Stewart 2014	Present review
Study participants	Adults and children	Adults and children	Adults and children	Children
Timeframe for searches	2000 to 2009	1996 to 2013	2000 to 2012	1980 to 2015
Number of studies included	24	60^b	58^c	105^b
Number of participants included	14,936	29,934	55,766	58,244
Summary estimate of sensitivity (95% CI)	85% (84 to 87)	86% (83 to 88)	84% (83 to 85)^d	86% (83 to 88)
Summary estimate of specificity (95% CI)	96% (96 to 97)	96% (94 to 97)	95% (94 to 95)^d	95% (95 to 96)
Investigations of heterogeneity	None performed	No evidence of significant variation in accuracy by test type (EIA versus OIA), and by age (children versus adults)	Did not identify sources of variability^d	Did not identify sources of variability
^aIn Spanish; ^bpairs of sensitivity and specificity; ^c59 study cohorts; ^damongst high‐quality studies. CI: confidence interval

Table 5. Comparison between previous systematic reviews on the diagnostic accuracy of RADTs for streptococcal pharyngitis and the present one

Table Tests. Data tables by test

Test	No. of studies	No. of participants
1 All studies (n = 116) Show forest plot	116	101121

2 Complete verification (n = 105) Show forest plot	105	58244

3 EIA (direct comparison) Show forest plot	2	802

4 OIA (direct comparison) Show forest plot	2	802

5 Acceava Strep A (Biostar) Show forest plot	2	789

6 ACON Strep A Rapid Test Strip Show forest plot	1	5505

7 BioNexia Strep A (BioMerieux) Show forest plot	1	183

8 CARDS QS Strep A (Quidel) Show forest plot	1	1184

9 Clearview Exact Strep A Show forest plot	1	630

10 Clearview Strep A Show forest plot	1	75

11 Diaquick Strep A Test (Dialab) Show forest plot	1	496

12 Directgen 1‐2‐3 Group A Strep (Becton Dickinson) Show forest plot	4	1189

13 Direct Strep A EIA Show forest plot	1	293

14 EIA (no name) Show forest plot	3	7228

15 Group A Strep Test (Quidel) Show forest plot	2	184

16 IM Strep A (International Microbio) Show forest plot	2	291

17 Meridian Bioscience Show forest plot	1	114

18 OSOM Strep A (Genzyme) Show forest plot	7	1349

19 OSOM Ultra Strep A (Genzyme) Show forest plot	4	1888

20 QuickVue Dipstick Strep A (Quidel) Show forest plot	2	2071

21 QuickVue Flex Strep A (Quidel) Show forest plot	2	1178

22 QuickVue In‐Line Strep A (Quidel) Show forest plot	6	4122

23 QuickVue+ Strep A (Quidel) Show forest plot	4	845

24 Sacks Biological Farms Show forest plot	1	6557

25 SD Bioline Strep A Show forest plot	2	404

26 Signify Strep A (Abbott) Show forest plot	1	6865

27 SMART Group A Strep (New Horizons) Show forest plot	1	1035

28 Strep A Abon kit Show forest plot	1	1243

29 Strep A OIA (Biostar) Show forest plot	13	6476

30 Strep A OIA Max (Biostar) Show forest plot	6	2960

31 Strep A Rapid Test Device Show forest plot	1	490

32 Strep A Sign Show forest plot	1	75

33 Strep A test II (INTEX Diagnostica) Show forest plot	1	1248

34 StreptAtest (Dectrapharm) Show forest plot	4	1640

35 Streptavit Show forest plot	1	75

36 Streptop A (ALL‐Diag) Show forest plot	1	292

37 SUDS Group A Strep Show forest plot	1	341

38 SureScreen Test Strep A Show forest plot	1	188

39 TestPack Strep A (Abbott) Show forest plot	10	14766

40 TestPack Plus (Abbott) Show forest plot	8	2883

41 TestPack Plus Strep A with OBC II (Abbott) Show forest plot	1	454

42 Ventrescreen Strep A (Ventrex Lab) Show forest plot	3	714

43 Visuwell Strep A (ADI) Show forest plot	3	926

44 Icon Strep A Show forest plot	4	865

45 Qtest (Becton Dickinson) Show forest plot	3	16645

46 Link 2 Strep A Rapid Test (Becton Dickinson) Show forest plot	1	432

47 Event Test Strip Strep A Show forest plot	1	510

Table Tests. Data tables by test