Summary
Background With the emergence of SARS-CoV-2 and the associated Coronavirus disease 2019 (COVID-19), there is an imperative need for diagnostic tests that can identify the infection. Although Nucleic Acid Test (NAT) is considered to be the gold standard, serological tests based on antibodies could be very helpful. However, individual studies measuring the accuracy of the various tests are usually underpowered and inconsistent, thus, a comparison of different tests is needed.
Methods We performed a systematic review and meta-analysis following the PRISMA guidelines. We conducted the literature search in PubMed, medRxiv and bioRxiv. For the statistical analysis we used the bivariate method for meta-analysis of diagnostic tests pooling sensitivities and specificities. We evaluated IgM and IgG tests based on Enzyme-linked immunosorbent assay (ELISA), Chemiluminescence Enzyme Immunoassays (CLIA), Fluorescence Immunoassays (FIA) and the point-of-care (POC) Lateral Flow Immunoassays (LFIA) that are based on immunochromatography.
Findings In total, we identified 38 eligible studies that include data from 7,848 individuals. The analyses showed that tests using the S antigen are more sensitive than N antigen-based tests. IgG tests perform better compared to IgM ones, and show better sensitivity when the samples were taken longer after the onset of symptoms. Moreover, irrespective of the method, a combined IgG/IgM test seems to be a better choice in terms of sensitivity than measuring either antibody type alone. All methods yielded high specificity with some of them (ELISA and LFIA) reaching levels around 99%. ELISA- and CLIA-based methods performed better in terms of sensitivity (90-94%) followed by LFIA and FIA with sensitivities ranging from 80% to 86%.
Interpretation ELISA tests could be a safer choice at this stage of the pandemic. POC tests (LFIA), that are more attractive for large seroprevalence studies show high specificity but lower sensitivity and this should be taken into account when designing and performing seroprevalence studies.
Funding None
Introduction
In December 2019, a pneumonia outbreak occurred in Wuhan in China due to a new coronavirus that was later officially named SARS-CoV-2 by the World Health Organization (WHO) 1, 2. The disease rapidly spread worldwide and on February 24, WHO declared COVID-19 (coronavirus disease 2019) a pandemic 3. SARS-CoV-2 shares pathogenicity features with the human coronaviruses SARS-CoV and MERS-CoV 4 but the incubation period is longer (up to 14 days) 3. Most patients exhibit mild symptoms and only a few cases progress to severe or critical disease. Risk factors for severe disease include older age 5 and comorbidities such as hypertension, diabetes, chronic obstructive pulmonary disease (COPD), and cardiovascular disease 6, whereas a higher incidence in males has also been reported 7.
The genome of SARS-CoV-2 is predicted to encode 4 structural proteins (including Spike (S), and Nucleocapsid (N)), 8 accessory, and 15 non-structural proteins 8. The S protein comprises S1, which is responsible for binding to the ACE2 membrane receptor of the host cell 9-12. The N protein is the structural helical nucleocapsid protein of the virus and is important for transcription and viral replication and packaging 13, 14. The S and N proteins show high antigenicity 15-17.
Although rigorous public health measures have been taken globally including mass quarantine, COVID-19 incidence is rising leading to 2,402,980 laboratory-confirmed cases and over 165,641 deaths worldwide by April 20. Due to the ongoing COVID 19 outbreak, there is an urgent global need for diagnostic tests. WHO suggests that detection of SARS-CoV-2 nucleic acid (E gene followed by the RdRp gene) is performed in respiratory samples 18-20, while the United States Centers for Disease Control (CDC) recommends the nucleocapsid protein targets N1 and N2 21. However, the global shortage of diagnostic tests and especially of swabs for collecting respiratory samples, the frequency of false negative results, and the inability of these tests to be performed in a balk and quick manner that is often required at hospital admission, highlight the necessity to develop additional testing methods.
COVID-19 serological tests for IgG and IgM have been developed by many laboratories and companies and can be useful in various ways: a) they can confirm Nucleic Acid Tests (NAT) results or detect infected people who were negative according to NATs 22, b) they are cheap, quick, and amenable to rapid broad screening at points of care (POC), c) blood/serum samples that are used show reduced heterogeneity compared to respiratory specimens, and d) blood/serum sampling encompasses lower risk for health care workers compared to respiratory sampling where patients are more likely to disperse the virus. Additionally, serological assays can help determine the immune status of individuals 15, and efforts to estimate herd immunity.
Since all the above serological tests have been developed rapidly and under urgent market demands, they are poorly validated with clinical samples in everyday practice. Within several studies, these tests show divergence in sensitivity and specificity that may deviate from what the manufacturers report. Given the importance of serological tests in combating COVID-19, this systematic review and meta-analysis aims to summarize the available evidence on the performance of all available antibody-tests for SARS-CoV-2.
Methods
Search strategy and selection criteria
For conducting the systematic review and the meta-analysis we followed the Preferred Reporting Items for Systematic reviews and Meta-analyses (PRISMA) guidelines 23 and the advises for best practices 24. We conducted the literature search using PubMed (https://www.ncbi.nlm.nih.gov/pubmed/), medRxiv (https://medrxiv.org/) and bioRxiv (https://www.biorxiv.org/). The search terms used were: (SARS-CoV-2 OR “Coronavirus disease 2019” OR COVID-19) AND (IgM OR IgG or antibodies OR antibody OR ELISA or “rapid test”). The references of selected articles were also searched. The searches were concluded by April 17, 2020, and four different researchers independently evaluated search results. Disagreements in the initial evaluation were resolved by consensus. We did not impose language criteria and included studies written in English and Chinese. Eligible articles were required to meet the following criteria: a) studies that reported COVID-19 cases confirmed either by NAT such as RT-PCR or sequencing documenting SARS-CoV-2 infection, or by a combination of NAT and clinical findings, and b) results concerning IgM and/or IgG antibodies using a variety of methods. We considered as eligible studies reporting the comparison of COVID-19 cases against non COVID-19 individuals, as well as case series reporting data only from COVID-19 patients.
Data extracted for each study included (if available): first author’s last name, percentage of male patients, mean age of COVID-19 patients, mean number of days from onset and percentage of severe or critically-ill COVID-19 patients. In addition, the different methods used for the determination of IgG and IgM were also recorded, along with their details. In order to construct the 2×2 contingency table and obtain estimates for sensitivity and specificity, we obtained the numbers of true positive (TP), false positive (FP), true negative (TN), and false negative (FN). For studies reporting only COVID-19 patients we recorded only TP and FN.
The immunoassay methods used for COVID-19 antibody (Ab) detection in all studies included in the present meta-analysis include Enzyme-linked immunosorbent assay (ELISA), Chemiluminescence Enzyme Immunoassays (CLIA), Fluorescence Immunoassays (FIA), and the point-of-care (POC) lateral flow immunoassays (LFIA) that are based on immunochromatography 25-29.
All methods were created to detect IgG and/or IgM antibodies (or even total antibodies) 30-32 against S (mainly RBD) and/or N viral proteins of human sera/blood samples. The ELISA method variations include μ-chain capture principle for IgM, indirect for IgG and double antigen sandwich for total antibody detection. ELISA gives quantitative data on antibodies by measuring Absorbance values (A450) and cut-off values determined for each test-plate. LFIA is an immunochromatography based assay using colloidal gold conjugated COVID-19 antigens. The test is rapid, performed on test strips of nitrocellulose and gives qualitative results that are judged by optical inspection usually 15 minutes after sample application. In some LFIAs purchased from companies the specific antigen that LFIA was based on was not reported. Due to the fact that most of the companies provide N and S based LFIAs, we assumed that in unspecified cases the LFIAs were N and S based. CLIA is a chemiluminescence based assay, mainly developed by companies giving quantitative results with the use of an analyzer. The analyzer can be batch and random access with the possibility to give results within half an hour at best 33, 34. Because in most cases CLIA detected both anti-N and anti-S IgG and IgM antibodies, (with only one study detecting anti-N 33, 34), we assumed N and S based IgG and IgM CLIAs in studies without relevant information. With FIA we denote fluorescence immunoassays that can be performed on multitest cover slides 35 or be based on fluorescence immunochromatography (AIE/Quantum dot-based fluorescence immunochromatographic assay, AFIA) 36, 37. The latter can be rapid but all need analyzers.
Data analysis
We performed a quality assessment of the included studies using the Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) tool, offered by the Review Manager Software (RevMan 5.2.3). The QUADAS is a quality assessment tool specifically developed for systematic reviews of diagnostic accuracy studies and consists of four key domains: patient selection, index test, reference standard and flow and timing; each domain is rated as low risk, high risk and unclear risk.
We used the bivariate meta-analytic method modified for the meta-analysis of diagnostic tests 38. The method has been shown to be equivalent to the so-called hsROC method 39, 40, and uses logit-transforms of TPR (true positive rate) and FPR (false positive rate) in order to model Sensitivity and Specificity, as well as, to account for the between-studies variability (heterogeneity). Studies that include information only for logit (TPR) are included under the missing at random assumptions in order to maximize the sample and allow for modelling the between-studies variability and correlation. The Begg’s rank correlation test 41 and the Egger’s regression test 42 were used on logit(TPR) to evaluate possible publication bias. The analysis was performed using Stata 13 (Stata Corporation, College Station, Texas, USA) and the command mvmeta with the method of moments for multivariate meta-analysis and meta-regression 43. Statistical significance was set at p<0.05. Meta-analysis was performed in cases where two or more studies were available, whereas meta-regression and tests for publication bias when 5 or more studies were available.
Results
The electronic search revealed 115 articles from PubMed, 72 from medRxiv and 12 from bioRxiv, from which we identified 38 eligible studies after scrutiny 25-37, 44-68(Figure 1). These include in total 7,848 individuals (3,522 COVID-19 cases and 4,326 healthy, or non COVID-19, individuals). 21 studies reported data for both COVID-19 cases and controls, whereas17 studies reported data only for COVID-19 cases. 13 studies used RT-PCR or other nucleic acid-based tests (NATs) as the gold standard for case ascertainment, whereas 25 studies ascertained COVID-19 cases using a combination of molecular and clinical features. The summary information of the included studies is presented in Table 1. We did not consider the results of different kits as separate, but we based our analysis in grouping the tests based on the method and the specific antigen used. In total we identified kits by 25 different companies, plus the various in-house tests produced for research purposes, so a separate analysis would be impossible. Several studies reported the results of multiple tests on the same individuals; however they were not included in the same meta-analysis since we analyzed each test separately. In one study that compared several different LFIA tests, we used the results of the one with the median performance (even though the differences were small). Other studies reported samples from multiple populations, and in such cases they were considered distinct.
14 studies in total reported results from ELISA-based tests (detecting anti-N or anti-S IgG, IgM antibodies or both). S-based ELISAs, in general, perform better compared to those based on N antigen. IgG and IgM seem to perform similarly, but the combination of IgG and IgM seems to be superior leading to a sensitivity of 0.935 (95% CI: 0.900, 0.971). All methods seem to have rather high specificities (ranging from 0.961 to 0.995). Meta-regression analysis showed that the mean number of days from disease onset and the proportion of severe/critical patients have an influence on the overall sensitivity of the IgG tests. Both Egger’s and Begg’s tests did not detect publication bias or other small study effects.
CLIA-based tests were used in 13 studies. In all cases anti-N and anti-S IgGs and IgMs were investigated. In this analysis we also pooled together the studies that considered NS antigens with the studies that used S antigen. The sensitivities of detecting IgG seem to be better compared to that of IgM (0.944 vs. 0.810). Combining IgM and IgG yields a slightly worse sensitivity (0.907, 95% CI: 0.753, 1.000) but this estimate arises from only two studies (970 patients) and thus has large uncertainty. Specificities range from 0.971 to 0.984. Meta-regression analysis revealed that the mean number of days from disease onset has an influence on the overall outcome in the IgG tests. The Begg’s test provided some evidence for publication bias in the IgG analysis.
13 studies reported results from LFIA-based tests. The majority of the tests identified antibodies against both N and S antigens and results were obtained for both IgG and IgM. In this analysis we also pooled together the studies that considered NS antigens with the studies that used S antigen. IgG and IgM seem to perform comparably, but rather low since the sensitivities range from 0.53 to 0.66. Combining IgG and IgM yields better estimates (0.78-0.83), but still with lower sensitivity compared to ELISA-based tests. Specificity in all cases ranged from 0.914 to 0.994. In the largest overall analysis, pooling together the 11 studies that used N, S, or NS antigens, the combination of IgG and IgM antibodies yields a sensitivity of 0.800 (95% CI: 0.663, 0.935) and specificity of 0.984 (95%CI: 0.969, 0.999). Meta-regression analysis revealed that the mean number of days from disease onset influences the overall outcome in the IgG and IgG/IgM tests. Both Egger’s and Begg’s tests could not find evidence for publication bias or other small study effects.
Lastly, FIA-based tests were found in three studies using a combination of N and S antigens. Both IgG and IgM show similar sensitivities (∼0.86) and specificities (0.95), however the sample is small (3 studies, 327 patients). Due to the small number of studies, tests for publication bias or meta-regression could not be applied.
Discussion
Non-pharmaceutical interventions including increased testing rates, contact tracing, school closures, ban of mass gatherings, physical distancing, restriction of movement, and cordon sanitaire were effective in reducing transmission rates of SARS-CoV-2 in Wuhan, China and other settings 69. However, this type of intervention has tremendous societal and economic consequences potentially resulting in social disorganization and great recession. One approach to de-escalating public health measures and returning to a state of normalcy, while maintaining epidemiological vigilance and ability to respond fast to viral resurgence, is to identify people with immunity to SARS-CoV-2 and estimate their proportion in the entire population. This approach would indicate immune people including health-care workers who can go back to work without risking their health or that of others, help reopen borders, and monitor the development of herd immunity. Unfortunately, human immune response to the new pathogen is not well studied yet. The serological tests that have recently been developed employ different methods and target either IgG or IgM or both. In an attempt to fill the knowledge gap, this systematic review summarized evidence from 38 studies involving 7,848 individuals. The meta-analysis showed that all methods yielded high specificity with some of the methods (ELISA and LFIA) reaching levels higher than 99%. ELISA- and CLIA-based methods performed better in terms of sensitivity (90-96%) followed by LFIA and FIA with sensitivities ranging from 80% to 86%.
Sample quality, low antibody concentrations and especially timing of the test - too soon after a person is infected when antibodies have not been developed yet or too late when IgM antibodies have decreased or disappeared - could potentially explain the low ability of the antibody tests to identify people with COVID-19. According to kinetic measurements of some of the included studies 22, 49, 54 IgM peaks between days 5 and 12 and then drops slowly. IgGs reach peak concentrations after day 20 or so as IgM antibodies disappear. This meta-analysis showed, through meta-regression, that IgG tests did have better sensitivity when the samples were taken longer after the onset of symptoms. This is further corroborated by the lower specificity of IgM antibodies compared to IgG 15. Only few of the included studies provided data stratified by the time of onset of symptoms, so a separate stratified analysis was not feasible, but this should be a goal for future studies. Moreover, irrespective of the method, a combined IgG/IgM test seems to be a better choice in terms of sensitivity than measuring either antibody type alone. The analyses also showed that tests that use the S antigen are more sensitive than N antigen-based tests probably due to higher sensitivity and earlier immune response to the S antigen 52 and more specific perhaps due to less cross-reactivity with less conserved regions of spike proteins existing in other coronaviruses (SARS-CoV) 17, 55, 64. Combining N and S antigens further improves sensitivity. Finally, despite the suboptimal sensitivity, antibody tests could certainly supplement NATs in the diagnosis of people with suspected SARS-CoV-2 infection 65. In any case, a direct comparison of antibody tests against NATs is also needed in future studies (in the current review only a handful of studies performed this, and they did that only in COVID-19 patients).
Antibody tests for SARS-CoV-2 have other accuracy issues that deserve attention and further assessment. For instance, cross-reaction with human endemic coronaviruses could make antibody tests less specific and produce false positive results 30, 33, 55, 63. A low specificity may have important consequences both in terms of diagnosis and population surveillance. On the individual level, false positive results pose risks as people who have never been infected are perhaps allowed to work or travel because they are considered immune. On a population level and regarding epidemiological studies, given the low prevalence of SARS-CoV-2 in most settings at the moment, false positives may inflate prevalence estimates and give a distorted picture of lower mortality rate and higher population immunity than what is in reality. On the other hand, low sensitivity may result in falsely assuming that a person is not infected and consequently jeopardizing measures to prevent the spread of the epidemic. Based on the results of this meta-analysis, ELISA tests that achieved specificity higher than 99% and sensitivity ∼93% could be the safer choice at this stage of the pandemic. CLIA tests show comparable sensitivity (∼90%) but slightly decreased specificity (95-98%). LFIA tests on the other hand are particularly attractive for large seroprevalence studies and can be used as POC tests. They show high specificity, comparable to ELISA (∼99%), but lower sensitivity (∼80%), and these estimates should be taken into account when designing and performing seroprevalence studies, for instance, by adjusting properly the obtained positive and negative findings. On the individual level, perhaps mixed strategies could be adopted (for instance re-testing a negative finding).
Of note, even if tests are highly accurate, much about protective immunity is unknown and the true presence of binding antibodies might not mean that people have indeed developed high titers of neutralizing antibodies and are thus immune to re-infection 70. Research on Rhesus macaques infected with SARS-CoV-2 was promising though showing that reinfection did not occur following rechallenge with the same dose of SARS-CoV-2 strain 71. Finally, viral load does not decline rapidly after seroconversion and people may remain infectious despite being truly positive in antibodies tests 35.
Data Availability
This is a meta-analysis of publicly available data.
Contributors
PG conceived the study, participated in data collection and performed the analysis. PK, GB, ND and GN participated in data collection and in the interpretation of the results. All authors participated in drafting the manuscript. All authors read and approved the final version of the manuscript.
Declaration of interests
The authors declare that they have no competing interests.
Acknowledgments