Correcting COVID-19 PCR Prevalence for False Positives in the Presence of Vaccination Immunity

Many public health authority reports on COVID-19 cases confound positive test results with population prevalence. As the population prevalence approaches the PCR test false positive rate (FPR), for example during a vaccination campaign, it is necessary to adjust the the raw test results for the false positive rate. This paper provides a technique for estimating the test false positive rate and making the correction to test population prevalence in the absence of accurate and definitive specificity. Using current data providing by the Public Health England as of the most recent complete data, a false positive rate of 1.16% (95% CI 1.09 - 1.23% ) was found for the PHE PCR test for the period 1 January through 29 March 2021. During this period, the test population prevalence is decreasing, starting at a decay rate estimated as 3.0% per day (CI 2.79 - 3.14%). This rate of decay increased to an estimated 14.7% by the end of the period (CI 13.30 - 16.16%) Finally, mean test population prevalence was estimated at 14.3% (CI 13.75 - 14.87%) on 1 January and is estimated to have declined significantly to 0.06% (CI 0.00 - 0.13%). If PCR test positivity are used without the application of the false positive rate, the percent positive PCR tests will eventually "flatline" at the false positive rate, and produce a false positive bias even if test population prevalence should fall to zero.


Introduction
Many public health authority reports on COVID-19 cases confound positive test results with population prevalence. As the population prevalence approaches the PCR (polymerase chain reaction) test false positive rate (FPR), for example during a vaccination campaign, it is necessary to adjust the the raw test results for the false positive rate. This paper provides a technique for estimating the test false positive rate and making the correction to test population prevalence in the absence of accurate and definitive specificity.

Methods: Data
For the analysis, it is necessary to have a relatively clean data stream with biases removed. For most public reporting, the PCR and LFD (lateral flow device) test results are either combined to give case counts; or if kept separate, the PCR test results may be biased using reported LFD positives without adjustment for the total LFD tests conducted in the denominator. Fortunately, Public Health England (PHE) has recently provided separate daily data fields for PCR tests and LFD tests, including for LFD tests that have been confirmed by PCR. Therefore, it is relatively simple to construct an unbiased PCR test positivity from the ratio of from their data field "newCasesPCROnlyBySpecimenDateRollingSum" which specifically excludes PCR tests that have been previously confirmed by a prior positive LFD test. Further, the denominator of total PCR tests can be adjusted by subtracting out the total positive LFD tests 1 Correspondence: michael.halem@becare net . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 9, 2021. ; https://doi.org/10.1101/2021.04.06.21255029 doi: medRxiv preprint that have been submitted to PCR for testing, which while a relatively small number (about 800 compared to 240,000 tests per day) adds a small amount of accuracy.
Theoretically the technique could be used for a combined LFD and PCR testing stream. However, this is only true if the ratio of LFD to PCR tests remained constant such that an overall FPR could be computed. A more sophisticated method (not presented) allows these to be computed together. However, it is the author's belief that the PHE LFD test data is biased due to non-reporting of negative tests from home testing of students and staff in the England education system. Analysis of such potentially biased data is beyond the scope of this paper.
Seven day rolling total PCR, LFD, and vaccination data are downloaded from PHE [1,2]  It should be noted that PHE provides the LFD and PCR test numerators (the new positive "cases") only by specimen date. The PCR denominator is generated from the "newPCRTestsByPublishDateRollingSum" field on the specimen date. For this field, it is estimated that publish date data represents specimens that were taken on average 2 days previously. Informal spot checks of other PHE data using linear regression of total cases by specimen date against publish date, shifted between 0 and 7 days confirms this assumption is reasonable. Further, the total PCR test rolling sum is relatively stable with less than a 4 percent day to day change. Rolling sum data is divided by 7 to give a (7 day) daily moving average. The percentage positivity is found by dividing the PCR positive tests unrelated to a prior LFD test (field 2) by the total PCR tests less the LFD positive LFD tests which are assumed to have been submitted to PCR for confirmation. Without excluding confirmed LFD tests in the PCR numerator, the positive PCR testing would have a large bias towards a greater positivity because positive LFD tests have pre-screened the PCR tests. Without the removing LFD tests that have been submitted to PCR for confirmation, the denominator would have a slightly larger bias.
Vaccination population immunity is estimated from item's 5 and 6 using the population of England as the denominator, assuming an empirically determined 7 day delay between vaccination and immunity, and assuming a reasonable 80% immunity from dose one [3], and an additional 20% immunity from dose two.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Methods: Modelling
An inspection of the current test data will show that there is a decline similar to exponential decay as a large portion of the English population became vaccinated. The precise decay dynamics is not needed as the exponential model is sufficiently parsimonious and consistent with solutions to SIR family of models when the susceptible population has been depleted sufficiently for the reproduction number R to be significantly below 1 over short time periods. Generally, a population's prevalence exponential decay (or growth) can written as: where ( ) is the percentage prevalence (i.e. infectious, or previously infected depending on the authorities definition and testing strategy), ( is the initial prevalence at the start of the measurement period, is the days from the start, and is the daily decay rate if negative. The exponential solution follows directly from the differential equation for the change of infection with respect to time in the classic SIR model and similar models [4] where is the time varying infection, is time (by convention in days), is the transmissibility (assumed constant), / is the percentage of population that is susceptible, and is the recovery rate from the infected group to the recovered group. Using the exponential distribution, = 1/ , where is the mean time of infection: for COVID-19 somewhere between 5 and 10 days. It can be seen by substitution that Eq. 1, the exponential increase or decay, is the solution to Eq. 2 when all parameters in the parenthesis are constant. Further it can be seen that the rate of exponential decay = / − . From inspection it is obvious that the decay rate r is a linear function of the susceptible percentage of the total population / such that over shorter time periods and with (assuming transmissibility is constant) and where new natural infections are small relative to vaccinations, the decay rate is a linear function of the vaccinations that have removed susceptible people from the population. where ( is the initial decay rate at time ( (i.e. = 0), =>:? is the mean percentage with vaccination immunity between for the time interval ( and , ( is the percentage with vaccination immunity at ( , and 9:; is a regression solved constant showing the change in decay rate per excess vaccinations days over the initial vaccination rate. The algebra for computing test positivity from population prevalence is well known. For a complete derivation please see [5]  where is the population prevalence, D is the test positivity, and the test sensitivity. With no reference standard, the sensitivity is not estimated and is set arbitrarily to 1 (i.e. perfect): CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 9, 2021. where is the residual error that the non-linear least squares model is minimizing.

Results
The result is summarized below:

Table 2 -Model Fit Result
The estimates and their confidence intervals (+/-2 standard errors) have been extracted to 6 decimal places of precision, with an ad hoc P (percent of the variance of the independent variable explained) so as to be roughly comparable to an ordinary least squares linear regression.
The regression estimates l ( D ) via the R predict.nls() function (part of the base R stats package). D is estimated as STU(V W ) produce the test population prevalence by rearranging Eq. 5: Confidence intervals were obtained when directly estimated by the R nls() function, by doubling the nls() summary standard errors. The the geometric average decay rate ( ) and the ending decay rate ( >?Y ) were estimated assuming that variables were random, normally distributed, and uncorrelated using the simplifying formula for the variance of the addition of two such random variables, i.e.
In the case of the end of period population prevalence estimate >?Y , the regressions estimate of ( D ) has a residual standard error, in this case 0.054. The CI for D is thus STU(V W )±P_`a . The confidence interval was calculated by substituting into Eq. 7 the respective worst case standard errors for D and (i.e. flip the sign for the plus or minus): . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 9, 2021. Of note, because the PCR sensitivity is set to 1, it may well find positive "cases" that are not infectious, i.e. viral fragments from prior infections.
The results are summarized in the below graph which show the data, the fit and a short projection for both test positivity and test population prevalence. The exponential like decay to the false positive rate can be clearly seen, as can the large difference between test population prevalence (the red line) and test positivity (the black X's and the black line. (Of note is that a similar graph can be obtained using Israel Ministry of Health data [8], but that the Israel data lacks documentation to separate PCR and LFD.) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Discussion and Conclusion
A technique is presented to extract PCR false positive rates (i.e. specificity). The exponential like decay to a false positive rate floor visually speaks for itself. Using this data, test population prevalence may be extracted. For this PHE data (2021 to date), the population prevalence is well below the estimated false positive rate and is rapidly decaying. Without the application of the false positive rate to the current PCR test data, at this stage of the epidemic, the PCR test positivity will "flatline" with time, and continue to produce false positives even if the population prevalence has fallen to zero.

Limitations
This paper is not peer reviewed. While the author exercised reasonable care in presenting the results, hidden mistakes may be contained therein. The statistical techniques used within this paper may not be statistically robust.
While the Public Health England data is relatively clean, it is an amalgamation of multiple PCR testing sites, each which may change its test parameters at any time, resulting in a different false positive rate. The technique performs best when the testing parameters and the tests used are consistent and homogeneous. For example, some public health authorities may mix lateral flow test results with PCR test results; or may bias test results by screening first, or dropping negative results. Such mixed, changed, or biased data makes the ability of the technique to discriminate trends less reliable.
A constant transmissibility (i.e. social distancing) is assumed. The change in naturally acquired immunity (but not the absolute level) over the period is assumed to be relatively insignificant.
The technique works during an epidemic curve period when the total infections are falling rapidly in a consistent manner, so that the false positive rate floor can be detected. During other periods, the false positive floor may not be discernible.
The reported validation of many COVID-19 PCR tests indicate that the specificity rate is 100% (i.e. false positive rate is zero). [9,10]. The results presented here are in contradiction to those reports. The author suggests that real-world testing of large populations has a different specificity than the laboratory and small scale validations.

Code and Data Availability
All code is available on line via MedRxiv or by request to the author. All data is downloaded from Public Health England using the function fetchEngland() contained within the code, or alternatively can be downloaded manually using the url's contained within the code's comments. The most recent data download files are included for reference.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.