SARS-CoV-2 reinfection trends in South Africa: analysis of routine surveillance data

Objective: To examine whether SARS-CoV-2 reinfection risk has changed through time in South Africa, in the context of the emergence of the Beta and Delta variants Design: Retrospective analysis of routine epidemiological surveillance data Setting: Line list data on SARS-CoV-2 with specimen receipt dates between 04 March 2020 and 30 June 2021, collected through South Africa's National Notifiable Medical Conditions Surveillance System Participants: 1,551,655 individuals with laboratory-confirmed SARS-CoV-2 who had a positive test result at least 90 days prior to 30 June 2021. Individuals having sequential positive tests at least 90 days apart were considered to have suspected reinfections. Main outcome measures: Incidence of suspected reinfections through time; comparison of reinfection rates to the expectation under a null model (approach 1); empirical estimates of the time-varying hazards of infection and reinfection throughout the epidemic (approach 2) Results: 16,029 suspected reinfections were identified. The number of reinfections observed through the end of June 2021 is consistent with the null model of no change in reinfection risk (approach 1). Although increases in the hazard of primary infection were observed following the introduction of both the Beta and Delta variants, no corresponding increase was observed in the reinfection hazard (approach 2). Contrary to expectation, the estimated hazard ratio for reinfection versus primary infection was lower during waves driven by the Beta and Delta variants than for the first wave (relative hazard ratio for wave 2 versus wave 1: 0.75 (95% CI: 0.59-0.97); for wave 3 versus wave 1: 0.70 (95% CI: 0.55-0.90)). Although this finding may be partially explained by changes in testing availability, it is also consistent with a scenario in which variants have increased transmissibility but little or no evasion of immunity. Conclusion: We conclude there is no population-wide epidemiological evidence of immune escape and recommend ongoing monitoring of these trends.


Introduction
As of 30 June 2021, South Africa had more than two million cumulative laboratoryconfirmed cases of SARS-CoV-2, concentrated in three waves of infection. The first case was detected in early March 2020 and was followed by a wave that peaked in July 2020 and officially ended in September. The second wave, which peaked in January 2021 and ended in February, was driven by the Beta (B.1.351 / 501Y.V2 / 20H) variant, which was first detected in South Africa in October 2020 (7). The third wave, which peaked in July and ended in September 2021, was dominated by the Delta (B.1.617.2 / 478K.V1 / 21A) variant (8).
Following emergence of the Beta and Delta variants of SARS-CoV-2 in South Africa, a key question remains of whether there is epidemiologic evidence of increased risk of SARS-CoV-2 reinfection with these variants (i.e., immune escape). Laboratory-based studies suggest that convalescent serum has a reduced neutralizing effect on these variants compared to wild type virus in vitro (3)(4)(5)(6); however, this finding does not necessarily translate into immune escape at the population level.
To examine whether reinfection risk has changed through time, it is essential to account for potential confounding factors affecting the incidence of reinfection: namely, the changing force of infection experienced by all individuals in the population and the growing number of individuals eligible for reinfection through time. These factors are tightly linked to the timing of epidemic waves. We examine reinfection trends in South Africa using two approaches that account for these factors to address the question of whether circulation of is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 11, 2021. ; https://doi.org/10.1101/2021.11.11.21266068 doi: medRxiv preprint the Beta or Delta variants was associated increased reinfection risk, as would be expected if their emergence was driven by immune escape.

Data sources
Data analysed in this study come from two sources maintained by the National Institute for Civil unrest during July 2021 severely disrupted testing in Gauteng and KwaZulu-Natal, the two most populous provinces in the country. As a result, case data became unreliable and a key assumption of our models -that the force of infection is proportional to the number of positive tests -was violated. Increasing vaccination rates from August 2021 could also introduce bias. We therefore limited the anlysis to data with specimen receipt dates between 04 March 2020 and 30 June 2021.
A combination of deterministic (national ID number, names, dates of birth) and probabilistic linkage methods were utilized to identify repeated tests conducted on the same person. In addition, provincial COVID-19 contact tracing teams identify and report repeated SARS-Cov-2 positive tests to the NICD, whether detected via PCR or antigen tests. The unique COVID-19 case identifier which links all tests from the same person was used to merge the two datasets. Irreversibly hashed case IDs were generated for each individual in the merged data set.
Primary infections and suspected repeat infections were identified using the merged data set. Repeated case IDs in the line list were identified and used to calculate the time . CC-BY-NC 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 11, 2021. ; https://doi.org/10.1101/2021.11.11.21266068 doi: medRxiv preprint between consecutive positive tests for each individual, using specimen receipt dates. If the time between sequential positive tests was at least 90 days, the more recent positive test was considered to indicate a suspected new infection. We present a descriptive analysis of suspected third infections, although only suspected second infections (which we refer to as "reinfections") were considered in the analyses of temporal trends. Incidence time series for primary infections and reinfections are calculated by specimen receipt date of the first positive test associated with the infection, and total observed incidence is calculated as the sum of first infections and reinfections. The specimen receipt date was chosen as the reference point for analysis because it is complete within the data set.

Data validation
To assess validity of the data linkage procedure and thus verify whether individuals identified as having suspected reinfections did in fact have positive test results at least 90 days apart, we conducted a manual review of a random sample of suspected second infections occurring on or before 20 January 2021 (n=585 of 6017; 9.7%). This review compared fields not used for linkages (address, cell-phone numbers, email addresses, facility, and health-care providers) between records in the NMC-SS and positive test line lists. Where uncertainty remained and contact details were available, patients or next-of-kin were contacted telephonically to verify whether the individual had received multiple positive test results.

Descriptive analysis
We calculated the time between successive positive tests as the number of days between the last positive test associated with an individual's first identified infection (i.e., within 90 days of a previous positive test, if any) and the first positive test associated with their suspected second infection (i.e., at least 90 days after the most recent positive test). is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 11, 2021. ; https://doi.org/10.1101/2021.11.11.21266068 doi: medRxiv preprint We also compared the age, gender, and province of individuals with suspected reinfections to individuals eligible for reinfection (i.e., who had a positive test result at least 90 days prior to 30 June 2021).
We did not calculate overall incidence rates by wave because the force of infection is highly variable in space and time, and the period incidence rate is also influenced by the temporal pattern of when people become eligible for reinfection. Incidence rate estimates would therefore be strongly dependent on the time frame of the analysis and not comparable to studies from other locations or time periods.

Statistical analysis of reinfection trends
We analysed the NICD national SARS-CoV-2 routine surveillance data to evaluate whether reinfection risk has changed since emergence of the Beta or Delta variants. We evaluated the daily numbers of suspected reinfections using two approaches. First, we constructed a simple null model based on the assumption that the reinfection hazard experienced by previously diagnosed individuals is proportional to the incidence of detected cases and fit  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 11, 2021. ; https://doi.org/10.1101/2021.11.11.21266068 doi: medRxiv preprint Approach 1: Catalytic model assuming a constant reinfection hazard coefficient Model description For a case testing positive on day (by specimen receipt date), we assumed the reinfection hazard is 0 for each day from + 1 to + 90 and ( # for each day > + 90, where ( # is the 7-day moving average of the detected case incidence (first infections and reinfections) for day . The probability of a case testing positive on day We ran 4 MCMC chains with random starting values for a total of 1e+05 iterations per chain, discarding the first 2,000 iterations (burn-in). Convergence was assessed using the Gelman-Rubin diagnostic (9).

Model-based projection
We used 1,500 samples from the joint posterior distribution of fitted model parameters to simulate possible reinfection time series under the null model, generating 100 stochastic realizations per parameter set. We then calculated projection intervals as the middle 95% of daily reinfection numbers across these simulations.
We applied this approach at the national level, as well as to Gauteng, KwaZulu-Natal, and Western Cape Provinces, which were the only provinces with a sufficient number . CC-BY-NC 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 11, 2021. ; https://doi.org/10.1101/2021.11.11.21266068 doi: medRxiv preprint of reinfections during the fitting period to permit estimation of the reinfection hazard coefficient.

Approach 2: Empirical estimation of time-varying infection and reinfection hazards
We  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 11, 2021. ; https://doi.org/10.1101/2021.11.11.21266068 doi: medRxiv preprint Individuals in . / and . are assumed to experience the same daily hazard of reinfection, estimated as ℎ . ( ) = . The daily hazard of infection for previously uninfected individuals is then estimated as ℎ * ( ) = .
If we assume that the hazard of infection is proportional to incidence ( ) ), ℎ * ( ) = * ( ) ) and ℎ . ( ) = . ( ) ) , we can then examine the infectiousness of the virus through time as: We also used this approach to construct a data set with the daily numbers of individuals eligible to have a suspected second infection ( . ( )) and not eligible for suspected second infection ( * ( ) + . / ( )) by wave. Wave periods were defined as the time surrounding the wave peak for which the 7-day moving average of case numbers was above 15% of the wave peak. We then analyzed these data using a generalized linear mixed model to ( ) ∼ group * wave + offset( (groupsize)) + (day) The outcome variable (groupinc) was the daily number of observed infections in the two groups. Our main interest for this analysis was in whether the relative hazard was higher in the second and third waves, thus potentially indicating immune escape. This effect . CC-BY-NC 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 11, 2021. ; https://doi.org/10.1101/2021.11.11.21266068 doi: medRxiv preprint is measured by the interaction term between group and wave. The offset term is used to ensure that the estimated coefficients can be appropriately interpreted as per capita rates.
We used day as a proxy for force of infection and reporting patterns and examined models where day was represented as a random effect (to reflect that observed days can be thought of as samples from a theoretical population) and as a fixed effect (to better match the Poisson assumptions). As focal estimates from the two models were indistinguishable, we present only the results based on the random effect assumption. Both versions of the model are included in the code repository.

Results
We identified 16,029 individuals with at least two suspected infections (through 30 June 2021) and 80 individuals with suspected third infections (Figure 1).
. CC-BY-NC 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 11, 2021. ; https://doi.org/10.1101/2021.11.11.21266068 doi: medRxiv preprint  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 11, 2021. ; https://doi.org/10.1101/2021.11.11.21266068 doi: medRxiv preprint

Time between successive positive tests
The time between successive positive tests for individuals with suspected reinfections was bimodally distributed with peaks near 180 and 360 days (Figure 2A). The shape of the distribution was strongly influenced by the timing of South Africa's epidemic waves. The first peak corresponds to individuals initially infected in wave 1 and reinfected in wave 2 or initially infected in wave 2 and reinfected in wave 3, while the second peak corresponds to individuals initially infected in wave 1 and reinfected in wave 3.  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 11, 2021. ; https://doi.org/10.1101/2021.11.11.21266068 doi: medRxiv preprint  Table S1.  Table S2.  (Table S1, Table S2, Figure S1).

Reinfection trends
The first individual became eligible for reinfection on 2020-06-02 (i.e., 90 days after the first is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 11, 2021. ; https://doi.org/10.1101/2021.11.11.21266068 doi: medRxiv preprint increase substantially during the second and third waves, peaking at a similar time to incident primary infections. The observed time series of suspected reinfections closely follows this pattern (Figure 3), although it falls slightly below the prediction interval toward the end of the time series. Provincial-level analyses suggest that this deviation is driven primarily by the Western Cape, where the observed time series of suspected reinfections falls below the prediction interval near the peak of both waves two and three ( Figure S3). In contrast, the observed time series of suspected reinfections consistently falls within the prediction interval for Gauteng and KwaZulu-Natal ( Figure S3). This pattern may result from policies implemented only in the Western Cape that limited testing during the wave peaks. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint

Approach 2: Empirical estimation of time-varying infection and reinfection hazards
The copyright holder for this this version posted November 11, 2021. ; https://doi.org/10.1101/2021.11.11.21266068 doi: medRxiv preprint primary infection hazard has decreased slightly with each subsequent wave, from 0.15 in wave 1 to 0.12 in wave 2 and 0.1 in wave 3. The absolute values of the hazard coefficients and hazard ratio are sensitive to assumed observation probabilities for primary infections and reinfections; however, temporal trends are robust ( Figure S5).
These findings are consistent with the estimates from the generalized linear mixed model based on the reconstructed data set. In this analysis, the relative hazard ratio for wave 2 versus wave 1 was 0.75 (CI !" : 0.59-0.97) and for wave 3 versus wave 1 was 0.70 (CI !" : 0.55-0.90). is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint

Discussion
The copyright holder for this this version posted November 11, 2021. ; https://doi.org/10.1101/2021.11.11.21266068 doi: medRxiv preprint primary infection increased without a corresponding increase in reinfection risk. Based on these analyses, we conclude there is no population-level evidence of immune escape at this time. We recommend ongoing monitoring of these trends.
Differences in the time-varying force of infection, original and subsequent circulating lineages, testing strategies, and vaccine coverage limit the usefulness of direct comparisons of rates of reinfections across countries or studies. Reinfection does however appear to be relatively uncommon. The PCR-confirmed reinfection rate ranged from 0% -1.1% across eleven studies included in a systematic review (10). While none of the studies included in the systematic review reported increasing risk of reinfection over time, the duration of follow-up was less than a year and most studies were completed prior to the identification of the Beta and Delta variants of concern. Our findings are consistent with results from the PHIRST-C community cohort study conducted in two locations in South Africa, which found that infection prior to the second wave provided 84% protection against reinfection during the second (Beta) wave (11), comparable to estimates of the level of protection against reinfection for wild type virus from the SIREN study in the UK (1).
A preliminary analysis of reinfection trends in England suggested that the Delta variant may have a higher risk of reinfection compared to the Alpha variant (12); however, this analysis did not take into account the temporal trend in the population at risk for reinfection, which may have biased the findings.
Our findings are somewhat at odds with in vitro neutralization studies. Both the Beta and Delta variants are associated with decreased neutralization by some anti-receptor binding-domain (anti-RBD) and anti-N-terminal domain (anti-NTD) monoclonal antibodies though both Beta and Delta each remain responsive to at least one anti-RBD (4,5,13). In addition, Beta and Delta are relatively poorly neutralized by convalescent sera obtained from unvaccinated individuals infected with non-VOC virus (3)(4)(5)13). Lastly sera obtained from individuals after both one and two doses of the BNT162b2 (Pfizer) or ChAdOx1 is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 11, 2021. ; https://doi.org/10.1101/2021.11.11.21266068 doi: medRxiv preprint (AstraZeneca) vaccines displayed lower neutralization of the Beta and Delta variants when compared to non-VOC and Alpha variant (5); although this does not have direct bearing on reinfection risk it is an important consideration for evaluating immune escape more broadly.
Non-neutralizing antibodies and T-cell responses could explain the apparent disjuncture between our findings and the in vitro immune escape demonstrated by both Beta and Delta.

Strengths of this study
Our study has two major strengths. Firstly, we analyzed a large routine national data set comprising all confirmed cases in the country, allowing a comprehensive analysis of suspected reinfections in the country. Secondly, we found consistent results using two different analytical methods, both of which accounted for the changing force of infection and increasing numbers of individuals at risk for reinfection.

Limitations of this study
The primary limitation of this study is that changes in testing practices, health-seeking behavior, or access to care have not been accounted for in these analyses. Estimates based on serological data from blood donors suggests substantial geographic variability in detection rates (14), which may contribute to the observed differences in reinfection patterns by province. Detection rates likely also vary through time and by other factors affecting access to testing, which may include occupation, age, and socioeconomic status.
In particular, rapid antigen tests, which were introduced in South Africa in late 2020, may be under-reported despite mandatory reporting requirements. If under-reporting of antigen tests was substantial and time-varying it could influence our findings. However, comparing temporal trends in infection risk among those eligible for reinfection with the rest of the population, as in approach 2, mitigates against potential failure to detect a substantial increase in risk.
Reinfections were not confirmed by sequencing or by requiring a negative test between putative infections. Nevertheless, the 90-day window period between consecutive . CC-BY-NC 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 11, 2021. ; https://doi.org/10.1101/2021.11.11.21266068 doi: medRxiv preprint positive tests reduces the possibility that suspected reinfections were predominantly the result of prolonged viral shedding. Furthermore, due to data limitations, we were unable to examine whether symptoms and severity in primary episodes correlate with protection against subsequent reinfection.
Lastly, while vaccination may increase protection in previously infected individuals (15)(16)(17)(18), vaccination coverage in South Africa was very low during the time of the study (e.g., <3% of the population was fully vaccinated by 30 June 2021 (19)). Vaccination is therefore unlikely to have substantially influenced our findings. Increased vaccination uptake may reduce the risks of both primary infection and reinfection moving forward and would be an important consideration for application of our approach to other locations with higher vaccine coverage.

Conclusion
To date, we find no evidence that reinfection risk is higher as a result of the emergence of Beta or Delta variants of concern, suggesting the selective advantage that allowed these variants to spread derived primarily from increased transmissibility, rather than immune escape. The discrepancy between the population-level evidence presented here and expectations based on laboratory-based neutralization assays highlights the need to identify better correlates of immunity for assessing immune escape in vitro. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 11, 2021. ; https://doi.org/10.1101/2021.11.11.21266068 doi: medRxiv preprint Individuals with multiple suspected reinfections  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 11, 2021. ; https://doi.org/10.1101/2021.11.11.21266068 doi: medRxiv preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint

Timing of primary infections and reinfections by province
The copyright holder for this this version posted November 11, 2021. ; https://doi.org/10.1101/2021.11.11.21266068 doi: medRxiv preprint Province-level comparison of data to projections from the null model is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 11, 2021. ; https://doi.org/10.1101/2021.11.11.21266068 doi: medRxiv preprint Approach 1: Convergence diagnostics is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 11, 2021. ; https://doi.org/10.1101/2021.11.11.21266068 doi: medRxiv preprint Approach 2: Sensitivity analysis Figure S5. Sensitivity analysis of empirical hazard ratio estimates to assumed observation probabilities for primary infections and reinfections. Estimates are shown for the full range of probabilities for which the overall mean relative hazard is between 0 and 1. The white polygon encloses the most plausible estimates (i.e. consistent with relative reinfection risk observed in the SIREN study (1) and observation probabilities for primary infection consistent with estimates based on seroprevalence data (14)). Top is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 11, 2021. ; https://doi.org/10.1101/2021.11.11.21266068 doi: medRxiv preprint