Clinical and experimental factors that affect the reported performance characteristics of rapid testing for SARS-CoV-2

ABSTRACT Tests that detect the presence of SARS-CoV-2 antigen in clinical specimens from the upper respiratory tract can provide a rapid means of COVID-19 disease diagnosis and help identify individuals that may be infectious and should isolate to prevent SARS-CoV-2 transmission. This systematic review assesses the diagnostic accuracy of SARS-CoV-2 antigen detection in COVID-19 symptomatic and asymptomatic individuals compared to RT-qPCR, and summarizes antigen test sensitivity using meta-regression. In total, 83 studies were included that compared SARS-CoV-2 rapid antigen lateral flow testing (RALFT) to RT-qPCR for SARS-CoV-2. Generally, the quality of the evaluated studies was inconsistent, nevertheless, the overall sensitivity for RALFT was determined to be 75.0% (95% confidence interval [CI]: 71.0-78.0). Additionally, RALFT sensitivity was found to be higher for symptomatic versus asymptomatic individuals and was higher for a symptomatic population within 7 days from symptom onset (DSO) compared to a population with extended days of symptoms. Viral load was found to be the most important factor for determining SARS-CoV-2 antigen test sensitivity. Other design factors, such as specimen storage and anatomical collection type, also affect the performance of RAFLT. RALFT and RT-qPCR testing both achieve high sensitivity when compared to SARS-CoV-2 viral culture.


INTRODUCTION 44
Severe Acute Respiratory Coronavirus-2 (SARS-CoV-2) is the highly transmissible viral agent 45 responsible for development of Coronavirus Disease 2019 . (Carlos et al., 46 2020;Chang et al., 2020;Li et al., 2020;Wang et al., 2020) Based on measurements from 47 onset [DSO], etc.), the anatomical collection site (e.g. nasopharyngeal versus anterior nares), and 86 specimen storage conditions. (Lijmer et al., 1999;Griffith et al., 2020;Accorsi et al., 2021) 87   88 As others have noted previously, a wide range of reported sensitivities has been reported for 89 rapid antigen testing. (Dinnes et al., 2020;Brümmer et al., 2021) The main objective of this meta-90 analysis was to explore possible causes of the high degree of heterogeneity of assay sensitivity 91 estimates across different studies. Data were summarized and analyzed from over 80 articles and 92 manufacturer IFUs to provide results on sensitivity for SARS-CoV-2 antigen testing from more 93 than 25 individual assays. 94 Specified whether the specimen was frozen prior to reference and index testing; (S6) Analytical 136 limit of detection information was available for the reference assay; (S7) The index test 137 manufacturer information was available. The exclusion criteria included: (1) Article/source from 138 a non-credible source; (2) Article/source contains an unclear or indistinct research question; (3)  139 Does not contain performance data specific to SARS-CoV-2; (4) Does not identify or does not 140 involve standard upper respiratory SARS-CoV-2 specimens (e.g. contains other specimen types 141 such as serological or saliva); (5) Contains no RT-qPCR reference results for comparison; (6)  142 Data were collected in an unethical manner; (7) The index test involves a mechanism other than 143 SARS-CoV-2 antigen detection involving a lateral flow (or similar) design; (8) Data not 144 conducive for extraction required for analysis; (9) No data regarding true positive and false 145 negative rates for the index test relative to the reference test. Additional, secondary exclusion 146 criteria included (S1) Article/source not in the English language; and (S2) Study did not involve 147 humans. 148 149 Full text reviews of the articles that passed initial screening were performed to identify those that 150 met inclusion/exclusion criteria involving study methodologies, specimen collection, SARS-151 9 domains: detection (measurement of test result), reporting (failure to adequately control 157 confounding, failure to measure all known prognostic factors), and spectrum (eligibility criteria, 158 forming the cohort, selection of participants). Risk of bias summary assessments for individual 159 studies was categorized as "high", "moderate", or "low". The overall quality of evidence for the 160 risk estimate outcomes (all included studies) was obtained using a modified Grading of 161 Recommendations, Assessment, Development and Evaluation (GRADE) (Schunemann et al.,162 2013) methodology for observational diagnostic studies. 163

164
The seven domains used to ascertain the overall study quality and strength across the six 165 independent variables were (1) Confounder effect; (2) Consistency; (3) Directness; (4) 166 Magnitude of effect; (5) Precision; (6) Publication bias; and (7) Risk of bias (ascertained from 167 individual studies). Study sub-groups were considered high quality when ≥ 4 of seven domains 168 received a green rating, with no red ratings and <3 unclear ratings; otherwise, it was considered 169 moderate quality. Study sub-groups were considered moderate quality when three domains were 170 green with <3 red domains; or when two domains were green and <3 domains were red with <4 171 domains unclear; or when 1 domain was green with <2 red domains and <3 domains were 172 unclear; or when no domains were green, no domains were red and <2 domains were unclear. collection type for specimens used for both index and reference testing (anterior nares/mid-178 turbinate versus nasopharyngeal/oropharyngeal); (5) Specimen storage conditions (fresh versus 179 frozen); (6) Analytical sensitivity of the reference RT-qPCR test (detection cutoff <500 genomic 180 copies/mL [cpm] versus ≥ 500 cpm); and (7) Assay manufacturer. 181 182 Data analysis 183 Data extraction was accomplished by two reviewers/authors with any discrepancies adjudicated 184 by a third reviewer/author. An independent author performed all statistical methods. All analyses 185 load information was available, subgroup meta-analysis by viral load (either measured by  PCR Ct of 25 or 30, or a viral cpm of 1X10 5 ) and symptomatic status was performed. The 200 minimum number of studies required for synthesis is n=3. 201 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 22, 2021. ;https://doi.org/10.1101https://doi.org/10. /2021 [95% CI: 60.7, 79.2] for the group including both ≤ 7 DSO and >7 DSO (reference positive 243 n=2,649) ( Figure 4 and Within a given viral load category, there were no statistical differences between studies 256 performed with symptomatic subjects versus studies performed with asymptomatic subjects; this 257 was true regardless of the exact definition of viral load category: Ct of 25, Ct of 30, or genome 258 copies per ml of 10 5 ( Figure 5 and Table 3). 259 260 True positives and false negatives by anatomic collection site were obtained from 97 data sets 261 that included reference nasopharyngeal specimens and from 25 data sets that included nasal 262 reference specimen. Antigen testing was usually paired from the same specimen type, only six 263 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 22, 2021. ; datasets being non-paired (antigen nasal, reference nasopharyngeal) specimens. When analysis 264 was performed on data stratified by anatomic collection site of the reference specimen, antigen 265 test sensitivity was higher with a nasal specimen (82.7% [95%CI: 74.7,88.5,p=0.037 (Table 3). 276 277 Analytical sensitivity of the reference method (RT-qPCR) was determined using the 278 manufacturer's IFU when it was identified in the source documents and used to stratify true 279 positive and false negative results associated with SARS-CoV-2 antigen testing. The LOD 280 threshold for low and high analytical sensitivity was 500 cpm, which was the median 281 (mean=582) LOD value for the analytical sensitivity from all of the reference methods included 282 in this sub-analysis. Sensitivity values for antigen testing when stratified by high (reference 283 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)  (Table 3). 285 286 Manufacturer (see Table S1) and study spectrum bias were also significant factors in subgroup 287 meta-analyses; higher sensitivity was reported in studies with large/moderate spectrum bias 288 (Table 3). A mixed-effects meta-regression model with moderators including symptom status, 289 anatomical collection site, study selection/spectrum bias, and manufacturer was fit to the studies. 290 All factors remained significant in the multivariate analysis, except study spectrum bias 291 (multivariate p = 0.757). The moderators accounted for 72% of study heterogeneity (model R 2 = 292 0.722). Visual inspection of unadjusted and multivariate-adjusted funnel plots for effect 293 estimates from individual sources against study's size was performed ( Figure 3). The funnel plot 294 asymmetry revealed possible reporting/publication bias reflecting fewer studies than expected 295 that could be characterized by a small group number and a low sensitivity estimate for the index. 296 Overall, study heterogeneity could largely be accounted for by the independent variables 297 identified through sub-group analysis in this study. 298 299 Culture as the reference 300 Sensitivity for SARS-CoV-2 antigen and RT-qPCR assays was determined as compared with 301 SARS-CoV-2 viral culture as the reference method. There were five data sets that contained RT-302 qPCR (reference positive n=154) and antigen test (reference positive n=167) results. The overall 303 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) antigen testing compared to RT-qPCR as the reference. 310 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

DISCUSSION 311
The PPA (sensitivity) point estimate for antigen testing, spanning the entire 135 data sets 312 included here, was 75.0% (95% CI: 71.0, 79.0). We found that factors including specimen viral 313 load, symptom presence, days from symptom onset, anatomical collection site, and the storage 314 conditions for specimen collection could all affect the measured performance of SARS-CoV-2 315 antigen tests ( Figure S1). In addition, our meta-analysis revealed that antigen test sensitivity 316 between different RT-qPCR assays and platforms. (Ransom et al., 2020;Rhoads et al., 2020) RT-343 qPCR assays have different analytical sensitivities; a universal Ct value reference has not been 344 established that can be used to define the optimal sensitivity/specificity characteristics for 345 antigen testing. In addition to stratification by Ct value, analysis was also performed for SARS-346 CoV-2 antigen testing sensitivity by absolute viral load (using 1X10 5 as the cutoff). When data 347 were analyzed using this strategy, similar results were observed as for stratification by Ct value. 348 The viral load threshold utilized here was determined by a consensus value that appeared with 349 regular frequency from the source articles and represented a viral threshold that consistently 350 delineated a zone, across which, the false positive rate increased for most antigen tests. It is 351 generally accepted that viral loads of less than 1X10 5 cpm correlate with non-culture positive 352 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 22, 2021. ; levels. However, whether 1X10 5 cpm is the most accurate threshold by which to measure antigen 353 test performance is still a topic for debate. Some studies suggest that viral loads closer to 1X10 6 354 cpm might be a more appropriate threshold, which would act to minimize false positive rates. the presence of symptoms probably overlaps with higher specimen viral load, which 369 subsequently affects the antigen test sensitivity. Anatomical collection type of the index and/or 370 reference test method can affect the measured sensitivity estimates of antigen testing during a 371 clinical trial; also, through a mechanism that involves increased/decreased viral load on the 372 specimen swab. Evidence suggests that viral loads may be higher with nasopharyngeal than with 373 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 22, 2021. ; nasal collection. (Pinninti et al., 2020) This difference may explain why measured antigen assay 374 performance appears to be higher in studies that use a nasal RT-qPCR reference method. 375 376 Another factor identified here as potentially influencing measured antigen assay sensitivity was 377 specimen storage, particularly with regard to the use of fresh vs frozen (i.e. "banked") 378 specimens. It is likely that protein antigen may, as the result of freeze/thawing, experience some 379 degree of structural damage potentially leading to loss of epitope availability or a reduction in 380 the affinity of epitope/paratope binding. Ninety-six (96) data sets involved fresh specimens for 381 antigen testing and 23 data sets included freeze/thawed specimens for antigen testing. Although 382 no statistically significant difference was detected between sensitivities for antigen test 383 conducted on fresh versus frozen specimens, possibly due to the low data set group number in 384 the frozen antigen group, a trend toward lower sensitivity was observed for tests performed on 385 frozen specimens ( frozen). Additional results from in-house (i.e., a BD-IDS laboratory) testing with two different 389 EUA authorized antigen assays demonstrate reduced immunoassay band intensity following 390 freeze-thaw cycles, thus further supporting the findings from the meta-analysis that a freeze-thaw 391 cycle could reduce analytical sensitivity for SARS-CoV-2 antigen testing ( Figure S2). 392 393 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 22, 2021. ; The analytical sensitivity associated with the reference RT-qPCR assay was also investigated 394 here as a possible variable that could affect the false negative rate of SARS-CoV-2 antigen 395 testing. We hypothesized that relatively high analytical sensitivity for the reference RT-qPCR 396 assay would impose a detection bias and result in decreased clinical sensitivity due to increased 397 false negatives occurring near the RT-qPCR limit of detection. However, stratification by 398 reference analytical sensitivity resulted in no difference in SARS-CoV-2 antigen test clinical 399 sensitivity. It is likely that the analytical sensitivity of RT-qPCR, regardless of the manufacturer, 400 is high enough that even relatively low sensitivity-RT-PCR assays are still well below the 401 corresponding limit of detection for antigen testing. On the other hand, some manufacturers 402 assay antigen test performance in a manner that involves sensitivity above and below a set Ct 403 value. It is possible that analysis involving stratification by RT-qPCR analytical sensitivity could 404 reveal differences in antigen test performance if all antigen test performances were determined in 405 a similar manner that involves predetermined high/low viral load categories. 406 407 Several population and study design-specific factors were identified to be associated with higher 408 measured assay sensitivity, likely due to the association with higher viral loads. This meta-409 analysis demonstrates that these factors exist in various combinations across studies in an 410 inconsistent way, thus making comparisons of assay performance across these studies 411 impossible. The lack of consistency across study designs makes it very difficult to compare point 412 estimates between antigen tests to judge their relative clinical efficacy. The introduction of 413 different forms of bias into study design, and during study conduct, could explain why 414 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 22, 2021. ; discrepancies have been noted, for example, between sensitivity values listed in manufacturers' 415 IFUs and those obtained during independent evaluation of the same antigen test. Ultimately, 416 direct comparison between antigen tests should be the most reliable approach for obtaining 417 relative performance characteristics with any certainty. Here, we stratified SARS-CoV-2 antigen 418 test sensitivity by spectrum bias associated with each of the data sources. We found that those 419 studies rated with higher spectrum bias also had higher antigen test sensitivities. In addition, the 420 funnel plot analysis that was performed for this meta-analysis shows obvious publication bias, 421 which implicated a lack of publication of studies with low study group number and low 422 sensitivity. 423

424
Clinical trials and studies involving diagnostics are vulnerable to the introduction of bias, which 425 can alter test performance results and obstruct an accurate interpretation of clinical efficacy or 426 safety. For example, antigen testing appears to have a higher sensitivity when compared to 427 SARS-CoV-2 viral culture as the reference than when compared to RT-PCR. However, these 428 two reference methods measure different targets: RNA only, versus infectious virus. Therefore, 429 their use as a reference method should be intended to answer difference scientific questions, 430 rather than artificially inflating apparent sensitivity point estimates. If the intent of a diagnostic 431 test is determining increased risk of infectiousness through the presence of infectious virus, the 432 high analytical sensitivity of RT-qPCR, which cannot distinguish RNA fragments from 433 infectious virus, renders this diagnostic approach vulnerable to the generation of false positive 434 results, particularly at later time points following symptom onset. At time points beyond one 435 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 22, 2021. ; week from symptom onset, a positive RT-qPCR result more likely indicates that an individual 436 has been infected, but is no longer contagious and cannot spread infectious virus. This is 437 especially true for those with a SARS-CoV-2-negative cell culture result. Previous reports have 438 shown that performance values for rapid antigen tests and SARS-CoV-2 viral culture exhibit 439 better agreement than do results from RT-qPCR compared to viral culture. (Pekosz et al., 2021) 440 This finding further supports the use of antigen testing for SARS-CoV-2 for the identification of 441 individuals with infectious virus who are therefore at greater risk of virus transmission. 442

443
Limitations 444 This study has some limitations. First, it was difficult to obtain reliable information across the 445 sources, in a consistent manner, about disease severity, in order to perform meta-analysis on this 446 aspect of COVID-19 diagnostics. Additionally, the studies included in this meta-analysis did not 447 contain sufficient information to explore the potential effect of factors previously demonstrated 448 to be associated with higher viral loads such as disease severity and community prevalence. 449

450
Conclusion 451 In addition to viral load, several factors including symptom status, anatomical collection site, and 452 spectrum bias all influenced the sensitivity for SARS-CoV-2 detection by antigen-based testing. 453 This heterogeneity of factors found to influence measured assay sensitivity, across studies, 454 precludes comparison of assay sensitivity from one study to another. Future consideration 455 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 22, 2021. ;https://doi.org/10.1101https://doi.org/10. /2021 SARS-CoV-2 point-of-care test performance compared to PCR-based testing and versus 935 the Sofia 2 SARS Antigen point-of-care test. J Clin Microbiol. 936 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 22, 2021. ;https://doi.org/10.1101https://doi.org/10. /2021   Performance bias, and Spectrum bias associated with each data source included. The frequency 956 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)    Confounder effect characterizes the degree to which all plausible confounders would tend to increase confidence in the estimated effect . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 22, 2021.  . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 22, 2021. ; https://doi.org/10.1101/2021.05.20.21257181 doi: medRxiv preprint . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 22, 2021. ; https://doi.org/10.1101/2021.05.20.21257181 doi: medRxiv preprint . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 22, 2021. ;https://doi.org/10.1101 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) [75, 89] Abbreviations: CI, confidence interval; DSO, days from symptom onset; Ct, cycle threshold; cpm, genomic copies per mL; NPS, nasopharyngeal swab a Q-test p-value for heterogeneity among subgroups calculated from random effects metaanalysis . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 22, 2021. ; https://doi.org/10.1101/2021.05.20.21257181 doi: medRxiv preprint SUPPLEMENTALFIGURES FIGURE S1 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint