## Abstract

The recent emergence of the COVID-19 pandemic has posed an unprecedented healthcare challenge and catastrophic economic and social consequences to the countries across the world. The situation is even worse for emerging economies like India. WHO recommends mass scale testing as one of the most effective ways to contain its spread and fight the pandemic. But, due to the high cost and shortage of test kits, specifically in India, the testing is restricted to only those who are symptomatic. In this context, pooled testing is recommended by some experts as a partial solution to overcome this problem. In this article, we explain the basic statistical theory behind the pooled testing procedure for screening as well as prevalence estimation. In real world situations, the tests are imperfect, and lead to false positive and false negative results. We provide theoretical explanation of the impact of these diagnostic errors on the performances of individual testing and pooled testing procedures. Finally, we study the effect of misspecification of sensitivity and specificity of tests on the estimate of prevalence, an issue, which is debated a lot among the scientists in the context of COVID-19. Our theoretical investigations lead to some interesting and precise understanding of some of these issues.

## 1 Introduction

Since its detection in the Wuhan province of China at the end of December 2019, the COVID-19 pandemic, caused by the virus SARS-CoV-2, has created catastrophic social, economic and health consequences for countries around the world. For an emerging economy like India, it poses a formidable challenge to the policy makers to contain its spread at a level so that the health care infrastructure is not overwhelmed. At the time of writing this article, in India, the number infected is around 150,000 and more than 4300 deaths have occurred due to COVID-19. A large part of the country is under ‘lockdown’. Given its unprecedented economic and social consequences, the policy makers are now desperately looking for an exit route to bring the economy back to normalcy. Effective implementation of social distancing, mass testing, contact tracing and quarantining is essential to contain the spread of the virus. Given that a large percentage of COVID-19 cases are asymptomatic, and are responsible for spreading the virus^{1-3}, it has been advised by the World Health Organization (WHO) that the most effective way to control the spread of the disease is to test as many people as possible. This was found to be effective in containing the spread of the virus in countries like South Korea, Singapore and China^{4-6}. However, the high cost and the short supply of testing kits^{7} are the main impediments for a country like India in conducting mass testing. In order to ramp up the testing efforts, some experts^{8-11} are recommending the use of pooled testing technique, a technique originally proposed by Dorfman^{12} to increase the speed, and reduce the cost, of screening the US army recruits for syphilis during the Second World War. Recently the Indian Council for Medical Research (ICMR) has given its approval^{13} and issued detailed guidelines^{14} for carrying out pooled testing for screening purposes.

It is interesting to observe that most experts recommend use of pooled testing only for the screening purpose. However, the data collected by pooled testing can also be effectively utilized for the estimation of prevalence of the virus, which is a later development in pooled testing re-search^{15}. This fact is not as widely known as the use of pooled testing for screening purpose in the general scientific community. As observed by many^{16,17}, the estimation of prevalence of the SARS-CoV-2 virus is crucial for an accurate estimation of fatality rate. The initial estimate of fatality rate given by WHO ^{18} was scandalously high, because it was based only on deaths among symptomatic patients. Later, it was revised downward after taking into consideration the fact that a significant percentage of the infected population is asymptomatic, and hence remain undetected. If 80% of the cases are asymptomatic, then the fatality rate computed only from the symptomatic cases would be five times higher than the true fatality rate. Such misinformation may create an unnecessary panic in the minds of the policy makers, which may lead to wrong decisions. Recently, a few planned serology surveys were carried out for the estimation of the prevalence of the SARS-CoV-2 virus (e.g., Sempos and Tian^{19}), and the results suggest that the overestimation of fatality rate in the population may be between fifty to eighty five times of the original fatality rate. These results created waves in the scientific community. We will revisit these results in Section 4.

This article mainly serves two purposes. First, we present the statistical theory of the pooled testing procedure for screening, and also for the estimation of prevalence using basic probability theory^{20}. Next, we discuss some practical issues arising out of the fact that these tests are imperfect. In other words, the tests may yield false positive and/or false negative results. Naturally, it raises an important practical question: what is the impact of these diagnostic errors on screening as well as on prevalence estimation? We study these effects using some hypothetical scenarios and discuss its implications in the context of COVID-19 pandemic.

The rest of the article is divided as follows. In Section 2 we present the statistical theory of pooled testing. In Section 3 we discuss the consequences when the test is imperfect. In Section 4, we discuss the estimation of prevalence using pooled data, and compare it with the estimation procedure with individual data. Finally, we conclude with some remarks in Section 5.

## 2 Dorfman pooled testing technique

Pooled testing was originally introduced by Dorfman^{12} in infectious disease studies to reduce the cost and increase the speed of data collection. Typically, for an infectious disease, a sample of blood or urine is tested for presence or absence of the disease. Rarer the disease, more effective is the pooled testing technique. Dorfman’s technique runs as follows. Suppose *n* samples of blood are to be tested for presence or absence of a disease. A sample is either positive (disease present) or negative (disease absent). We attach a Bernoulli variable *Y* to each sample. We assume *Y* =1 if the sample is positive, and *Y* = 0 if it is negative. Instead of testing the samples individually, the samples are pooled into *J* groups of sizes *n*_{1}, …,*n _{J}*. Let

*Y*denote the

_{ij}*Y*-value for the

*i*-th sample in the

*j*-th group. The samples in each group are pooled and tested. Instead of observing

*Y*, in the pooled testing set-up, we observe . It is further assumed that, , if and only if all samples in the

_{ij}*j*-th group are negative, i.e.,

*Y*= 0,

_{ij}*i*= 1, …,

*n*and , if and only if at least one sample in the

_{j}*j*-th group is positive, i.e.,

*Y*= 1, for at least one

_{ij}*i*= 1,

*..,n*. If for a group, the test outcome is positive, then the samples in the group are individually tested to detect the positive samples, and hence the diseased individuals. If the result is negative for a group then no further testing is required. Thus, for a positive test outcome, the number of tests required is one more than the group size, and for a negative test outcome, a single test is enough. If the chance of a positive test outcome for a group is small, in other words, the prevalence of the disease is low, then the average number of tests required would be much smaller in a group testing set-up than individual testing. Thus, it leads to considerable savings in the cost and time of testing. The process diagram is presented in Figure 1 for a group of size

_{j}*k*.

Now, suppose that the prevalence of a disease is *p*, and assume without loss of any generality *n _{j}* =

*k*, i.e., all groups are of equal size, and consequently

*n*=

*Jk*. Notice that, and . Further, suppose

*N*denotes the number of tests for the

_{j}*j*-th group. Clearly, it is equal to

*k*+ 1 if , and 1 if . Thus the expected number of tests for the

*j*-th group is

If *p* is small then *E*(*N _{j}*) is close to 1 even for a sufficiently large group size

*k*, and thus leading to a substantial saving in time and cost than individual testing. Notice that the technique is mainly useful when the prevalence is low.

The simplicity of the idea, and its easy implementation, has led to its wide applicability in different fields (see Roy and Banerjee^{15} and the references therein). Its use in the screening of sexually transmitted disease like chlamydia, gonorrhoea^{21} and HIV^{22,23} is worth mentioning. American Red Cross and the European Blood Alliance routinely use it to screen donated blood for infectious diseases^{24,25}. Given the huge shortage of test kits around the world, researchers worldwide are recommending the use of pooled testing for COVID-19^{8,26-28}.

## 3 Pooled testing for screening when the test is imperfect

Dorfman^{12} proposed the pooled testing technique assuming the test to be perfect, i.e., the chance of a false positive or a false negative test result to be zero. However, in reality, tests are far from perfect. In testing for the SARS-CoV-2 virus, two types of tests are mainly in use. “The first is a reverse-transcription polymerase chain reaction test, or RT-PCR. This is the most common diagnostic test used to identify people currently infected with SARS-CoV-2. It works by detecting viral RNA in a person’s cells – most often collected from their nose. The second test being used is called a serological or antibody test. This test looks at a person’s blood to see if they have produced antibodies for the SARS-CoV-2 virus. If a test finds these antibodies, it means a person was infected and made antibodies in response” ^{29}. How accurate are these tests? The accuracy of a test is measured by two numbers: sensitivity and specificity attached to it. If a test has 5% chance of a false negative (positive) outcome, its sensitivity (specificity) is 95%. Although in laboratory setting it is observed that RT-PCR test has high sensitivity and specificity, in the real world testing condition the sensitivity and the specificity are usually much lower because the testing conditions, the method of sample collection, and the sample preservation technique are far from perfect. For example, in real world testing condition the sensitivity of RT-PCR test ranges between 66% to 80%^{29}. Thus, out of three infected people on the average one may test negative. The sensitivity and specificity of the antibody tests are also found to be very high in laboratory setting, but, then again, in real world condition the accuracy is bound to suffer.

Using the same notations as above, we now find *E*(*N _{j}*) assuming that the sensitivity and specificity of the test are

*S*and

_{e}*S*, respectively. Notice that for the

_{p}*j*-th group, we now observe instead of , a surrogate (or proxy) for , where and . A simple probability calculation then yields , and hence,

The expected number of tests for *n* people would then be *J* × *E*(*N _{j}*), and hence compared to the individual testing, the reduction in the number of tests achieved by using group testing is

*J*(

*k*-

*E*(

*N*)). This reduction is a measures of the

_{j}*efficiency*of the pooled testing procedure. Also, notice that for

*S*=

_{p}*S*= 1, Equation (2) reduces to (1).

_{e}For a perfect test, there is no chance of misclassification of an infected as non-infected or vice versa, and hence the efficiency of pooled testing procedure is measured only by *E*(*N _{j}*). However, for an imperfect test,

*E*(

*N*) is not enough to judge the efficiency of the pooled testing procedure. One also needs to measure the sensitivity and specificity of the pooled testing procedure, and most importantly, the probabilities of the diagnostic errors.

_{j}Let *T*^{+} and *T*^{−} denote the events that an individual is tested positive and negative, respectively, by the pooled testing procedure. Further, let *I* (*I ^{c}*) denote the events that an individual is infected (uninfected). Then one can easily check that the sensitivity, say, , and specificity, say, , of the Dorfman’s pooled tesiting procedure, are given by
and
respectively. The proofs of (3)-(4) are given in the appendix.

Notice that depends only on *S _{e}* while depends on all three parameters

*S*,

_{e}*S*and

_{p}*p*. As stated above, and provide useful information about the performance of the pooled testing procedure. However, the posterior or inverse probabilities

*P*(

*I*|

^{c}*T*+) and

*P*(

*I*|

*T*

^{−}), called the False Positive Predictive Value (FPPV) and the False Negative Predictive Value (FNPV) respectively, are often critical for assessing the performance of the test in the real life situation. These are measures of diagnostic errors. The

*FPPV*and the

*FNPV*represent the proportion of misclassified individuals among those who are tested positive, and among those who are tested negative, respectively. In the context of testing for sera samples of HIV virus, Litvak (

^{22}) proposed that the five key elements to be used to capture the overall performance of a pooled testing procedure should be:

*E*(

*N*), , ,

_{j}*FPPV*and

*FNPV*.

Now, using Bayes’ Theorem (^{20}), one can easily show that

Notice that, for individual testing *FPPV* and *FNPV*, denoted henceforth *FPPV*_{I} and *FNPV*_{I} respectively, are obtained by substituting and by *S _{e}* and

*S*, respectively, in (5).

_{p}In Tables 1a and 1b, we furnish the values of *E*(*N _{j}*), , ,

*FPPV*,

*FNPV*,

*FPPV*

_{I}and

*FNPV*

_{I}for some appropriately chosen values of the sensitivity (

*S*), the specificity (

_{e}*S*) and the prevalence (

_{p}*p*). The “Eff” (efficiency) column gives the percentage reduction in the expected number of tests achieved by using the pooled testing than the individual testing. Evidently, with increase in

*p*, and decrease in

*S*and

_{e}*S*, the efficiency reduces. The value of is always less than, or equal to,

_{p}*S*. However, for a given set of values of

_{e}*p*,

*S*and

_{e}*S*, the value of is always more than

_{p}*S*, and for smaller values of

_{p}*S*this effect is substantial. Thus, the pooled testing procedure may lead to a significant improvement of the specificity, especially when the specificity of the individual test is low. Further, it is interesting to observe that even a slight decrease in

_{p}*S*leads to a substantial increase in the value of

_{p}*FPPV*for any given values of

*p*and

*S*. On the other hand, a change in the value of

_{e}*S*has negligible effect on the value of

_{e}*FPPV*for any given values of

*p*and

*S*. Also, with increase in

_{p}*p*,

*FPPV*decreases for any given values of

*S*and

_{e}*S*. Most importantly, compared to the individual testing, the pooled testing leads to a substantial reduction in the value of

_{p}*FPPV*, which is extremely important from the point of view of its application in practice. Still,

*FPPV*can take very high values even for pooled testing for a low prevalence disease with low specificity (

*S*). While this is undesirable, this compares favourably to individual testing. In contrast, for the range of values considered in the tables for

_{p}*p*,

*S*and

_{e}*S*, the impact of changes in the values of these parameters has little effect on

_{p}*FNPV*.

Notice that in case both the sensitivity and the specificity of the test are low, say, for example, *S _{e}* = 0.8 and

*S*= 0.7, even with 10% prevalence, 68% (77%) of the pooled (individual) testing results would be false positive, which is extremely high. A recent meta-study by a group of medical professionals from the Johns Hopkins University

_{p}^{30}demonstrated that over a different varieties of RT-PCR tests, which are the most commonly used tests for SARS-CoV-2, a best case scenario is an

*S*≈ 0.8. Another meta-study

_{e}^{31}has looked at thirty seven different external quality assessment studies of different medical assays, and found that in some cases, the specificity was as low as 0.83.

For testing for the SARS-CoV-2 virus, the above observations suggest that at the initial phase of the pandemic when the prevalence is low, (less than 10%,) pooled testing may lead to a substantially lower rate of false positive cases than individual testing, especially if a low specificity test is used. Admittedly, pooled testing leads to a slightly higher rate of false negative cases compared to individual testing, but as observed from Tables 1a and 1b, that effect is usually negligible.

Finally, it may be noted here that the impact of the two diagnostic errors are asymmetric. In case of more false positive cases, more people are to be quarantined leading to a lot of economic, social and emotional turmoil. On the other hand, in case of more false negative results, more infected people will be released in the population causing the disease to spread faster. Considering the costs of the errors, the policy planner has to make a trade-off.

## 4 Estimation of prevalence using pooled testing data

In this section, first we present the theory of estimation of prevalence of a disease from the test data using basic probability theory^{20}. Next, we study the impact of misspecification of sensitivity and specificity on the estimate of prevalence.

An accurate estimation of the prevalence of SARS-CoV-2-virus, being critical for assessing the lethality of the pandemic, is crucial for the policy makers to make strategic decisions. Recently, Ioannidis^{32}, a Stanford medicine professor, lamented in a highly critical opinion piece on COVID-19: “Three months after the outbreak emerged, most countries, including the U.S., lack the ability to test a large number of people and no countries have reliable data on the prevalence of the virus in a representative random sample of the general population.” Ioannidis argued that in the absence of such data, the estimate of fatality rate is bound to be a substantial overestimate of the true fatality rate if a significant percentage of COVID-19 cases are undetected. It is now well known that a significant percentage of COVID-19 cases are asymptomatic. Raman R Gangakhedkar, chief epidemiologist, ICMR, reported “Of 100 people with infection, 80 do not have symptoms,” ^{33}. A lab in Iceland^{34} has suggested that 50% of the infected are asymptomatic. In a Boston homeless shelter, out of 400 guests staying there, 146 tested positive for COVID-19, but all were reported to be asymptomatic^{35}. If, for example, 80% of the cases are really asymptomatic, then the estimate of fatality rate would naturally be five times the true fatality rate. While the reported percentage of asymptomatic cases vary from place to place, all agree that it is significant. Considering the possibility of substantial underestimation of infected cases in the absence of test data, Ioannidis has contended that the virus could be less deadly than people think, and destroying the economy in the effort to fight with the virus could be a “once-in-a-century-evidence-fiasco”.

A recent research paper published by a team of Stanford medical scientists^{16} lends support to Ioannidis’ contention. By estimating the prevalence of SARS-CoV-2 virus among the residents of Santa Clara County of California using antibody test data of a “properly selected” sample of 3300 residents, it predicted that between 50 to 85 times more residents of the county were actually infected than what appeared in the official tallies^{36}. The authors claimed that “their data helps prove … if undetected infections are as widespread as they think, then the death rate in the county may be less than 0.2%, about a fifth to a tenth other estimates.” The publication of this research paper immediately created waves in social media, in press, and in policy circles; if indeed, this number is not far from the truth, then the COVID-19 fatality rate is not very different from common flu. Realizing the findings’ serious policy implications, especially its lending support to the view of lifting the lock down, some scientists immediately issued a caveat about the accuracy of the estimate. They expressed concerns about the flaws in the sample selection method (through Facebook advertisements), the statistical analysis carried out, and the reliability of the antibody test^{37,38}.

We now discuss the methodology used by the Stanford group^{16} for prevalence estimation based on individual serological testing data. Let us denote the probability of a positive test result for an arbitrarily chosen individual by π. Clearly, for a sample of size *n*, the estimate of π is simply the proportion of people being identified as positive out of the sample. But we know from the previous section that

Plugging in the proportion of positive test results obtained from the sample, say , for *π* in (6), we obtain

However, for estimating prevalence using Dorfman’s algorithm, we simply need to replace *S _{e}* and

*S*in (6) by and respectively (cf. (Equations 3)-(4)). Thus, we have or, equivalently,

_{p}Hence, plugging in an estimate of *π* obtained from the test data to (8) and then solving it numerically for *p* we get an estimate of *p*, say . This estimate is consistent for *p* (i.e., close to *p* as the sample size *n* gets very large).

As noted in Section 3, both RT-PCR and antibody tests have high sensitivity and specificity in lab settings, but in a real world situation, these values may be substantially lower than those obtained in lab settings. Suppose and are respectively the sensitivity and the specificity values as perceived, and used by the scientist for estimating the prevalence from Equation (7) or (8), whereas and are respectively the true sensitivity and specificity values in the field. In real-life situations, it is often the case that is substantially less than , because the scientist’s perceived values are usually influenced by the values under the lab setting. Thus, given the true value of *π*, the solution to Equation (7) or (8) for *p* is equal to the true prevalence *p _{T}* if

*S*and

_{e}*S*are replaced by and . On the other hand, given the true value of

_{p}*π*, replacing

*S*and

_{e}*S*by and in (7) and (8), would yield a solution

_{p}*p*which is expected to deviate from the true prevalence

_{P}*p*. We study the effect of the misspecification of sensitivity and specificity values on the estimate of prevalence by evaluating the bias (

_{T}*p*-

_{T}*p*).

_{P}In Tables 3a-3b, we report the values of the bias (*p _{T}* -

*p*) resulting from the individual testing as well as Dorfman’s pooled testing, denoted by

_{P}*bias*- and

_{I}*bias*, respectively, for different values of

_{D}*p*, , , and . The following patterns are visible from the table:

With increase in the prevalence

*p*, the bias is decreasing. Initially, it reduces from positive to zero and then becomes negative. We report the bias for four different values of*p*: 0.01, 0.02, 0.05 and 0.1.The effect of misspecification of

*S*and_{e}*S*are asymmetric in nature. Misspecification of_{p}*S*has a little effect on the bias. However, the misspecification of_{e}*S*has a significantly large effect on the bias. Also, more is the deviation from the true values more is the effect on bias._{p}The most interesting observation is, compared to the individual testing, pooled testing reduces the bias due to misspecification of

*S*and_{e}*S*substantially. This is extremely important to know given the fact that misspecification of_{p}*S*and_{e}*S*are quite common._{p}

In Table 2, we report the values of *π*, the probability of testing positive, for the individual testing as well as for the Dorfman testing procedure, for different values of *p*, *S _{e}* and

*S*. Let

_{p}*π*(cf. Equation (6)) and

_{I}*π*(cf. Equation (8)) denote the

_{D}*π*value corresponding to the individual and Dorfman testing procedure, respectively. Notice from Table 2 that, like the observation (ii) made above, the value of

*π*is significantly influenced by

*S*, but not so by

_{p}*S*. Also, it is important to note that, for a given value of

_{e}*S*(say, 0.9) reduction in

_{p}*S*(say, from 1 to 0.7) results in reduction of

_{e}*π*, though not a significant reduction. Thus, we may conclude that misspecification of

*S*(

_{p}*S*) has a large (negligible) effect on the value of

_{e}*π*, and consequently, on the estimates of prevalence obtained by solving Equations (6) or (8), as the case may be. So, for accurate estimation of prevalence it is very important to specify the value of

*S*correctly, while misspecification of

_{p}*S*has little effect.

_{e}Finally, we explain a subtle connection between the numbers reported in Tables 3a-3b and Table 2 with a specific example. Consider the case when prevalence *p* is equal to 0.01, and the true sensitivity and specificity are both equal to 0.9. From Table 2, we observe that the corresponding value of *π* (cf. Equation (6)) for individual testing is equal to 0.11. Suppose now the perceived sensitivity and specificity are equal to 1. To obtain the biased prevalence *p* corresponding to the perceived sensitivity and specificity, we plug in 0.11 for *π* on the left hand side of Equation (6), and *S _{e}* =

*S*= 1 on the right hand side. The solution can be obtained from Table 2 by observing the value of

_{p}*p*corresponding to

*S*=

_{e}*S*= 1 and

_{p}*π*= 0.11, which is clearly 0.11. Thus the resulting bias due to misspecification is 0.11 − 0.01 = 0.10, which is reported in Table 3a. By using a similar argument we can find the bias for pooled testing which is 0.01 using Table 2.

_{I}## 5 Concluding Remarks

In this article, we have discussed the statistical theory behind Dorfman’s pooled testing technique used for screening. We have explained the method of estimation of prevalence from individual testing and pooled testing data. Most importantly, we have provided theoretical insights into the practical issues that are being discussed in the scientific community, arising out of the fact that the tests for the SARS-CoV-2 virus are imperfect. The theoretical results show that pooled testing is not only preferable for reducing time and cost of screening, it also helps in significant reduction of misclassification among those who are tested positive. Our results also show that for prevalence estimation, pooled testing is always preferable to individual testing, especially when the specificity of the test is low. It helps in reducing the bias of the prevalence estimate significantly. Finally, it is worth mentioning that our observations are valid for low values of prevalence, possibly less than or equal to 10%.

In this connection, as a final remark, we may conclude that for estimating the prevalence of SARS-CoV-2 of Santa Clara County, California^{16}, the Stanford scientists collected data by conducting individual antibody tests on the 3300 subjects. Instead, had they used pooled testing technique, it would have been possible to collect the test data from a much larger sample for a similar cost and time. Also, as mentioned in the paper, given that the prevalence is low (estimated as 2.8%), adopting pooled testing technique would have made all the more sense in the context of their study.

## Data Availability

No data used.

## Appendix

### A Computation of and

In the context of the Dorfman test, , the probability that a positive person can come positive is the the probability that both the pooled test as well as the individual test comes correct, which is .

To compute the misclassification probability that a person is non-diseased and is found to be diseased, we first note that a non-diseased person, say Person *j*, can either be a part of a pool constituted entirely of non-diseased individuals (Case A) or a pool containing some diseased individuals (Case B). The probability of Case A is (1 − *p*)* ^{k}*, and the probability of Case B is (1 −

*p*) − (1 −

*p*)

*.*

^{k}Now, for Person *j* to be found to be diseased in Case A, the pooled test, and subsequently the individual’s test, will both erroneously give positive result, the probability of which is (1 − *S _{p}*)

^{2}for the original Dorfman algorithm. On the other hand, in Case B, the pooled test will be positive, but this time correctly, while the individual’s test will still be erroneously positive, the probability of which is

*S*(1 −

_{e}*S*). Putting all together, the probability that a non-diseased person is found to be diseased is [(1 −

_{p}*S*)

_{p}^{2}(1 −

*p*)

*+*

^{k}*S*(1 −

_{e}*S*)((1 −

_{p}*p*) − (1 −

*p*)

*))]/(1 −*

^{k}*p*) = (1 −

*S*)

_{p}*S*+ (1 −

_{e}*Sp*)(1 −

*S*−

_{p}*S*)(1 −

_{e}*p*)

^{k}^{−1}. Hence, is 1 − (1 −

*S*)

_{p}*S*+ (1 −

_{e}*S*)(1 −

_{p}*S*−

_{p}*S*)(1 −

_{e}*p*)

^{k}^{−1}.

## Footnotes

Email: pritha{at}xlri.ac.in

Email: tathagata{at}iima.ac.in