## Abstract

As severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spreads, the susceptible subpopulation declines causing the rate at which new infections occur to slow down. Variation in individual susceptibility or exposure to infection exacerbates this effect. Individuals that are more susceptible or more exposed tend to be infected and removed from the susceptible subpopulation earlier. This selective depletion of susceptibles intensifies the deceleration in incidence. Eventually, susceptible numbers become low enough to prevent epidemic growth or, in other words, the herd immunity threshold is reached. Here we fit epidemiological models with inbuilt distributions of susceptibility or exposure to SARS-CoV-2 outbreaks to estimate basic reproduction numbers (*R*_{0}) alongside coefficients of individual variation (CV) and the effects of containment strategies. Herd immunity thresholds are then calculated as or , depending on whether variation is on susceptibility or exposure. Our inferences result in herd immunity thresholds around 10-20%, considerably lower than the minimum coverage needed to interrupt transmission by random vaccination, which for *R*_{0} higher than 2.5 is estimated above 60%. We emphasize that the classical formula, 1 − 1/*R*_{0}, remains applicable to describe herd immunity thresholds for random vaccination, but not for immunity induced by infection which is naturally selective. These findings have profound consequences for the governance of the current pandemic given that some populations may be close to achieving herd immunity despite being under more or less strict social distancing measures.

Scientists throughout the world have engaged with governments, health agencies, and with each other, to address the ongoing pandemic of coronavirus disease (COVID-19). Mathematical models have been central to important decisions concerning contact tracing, quarantine, and social distancing, to mitigate or suppress the initial pandemic spread^{1}. Successful suppression, however, may leave populations at risk to resurgent waves due to insufficient acquisition of immunity. Models have thus also addressed longer term SARS-CoV-2 transmission scenarios and the requirements for continued adequate response^{2}. This is especially timely as countries relax lockdown measures that have been in place over recent months with varying levels of success in tackling national outbreaks.

Here we demonstrate that individual variation in susceptibility or exposure (connectivity) accelerates the acquisition of immunity in populations. More susceptible and more connected individuals have a higher propensity to be infected and thus are likely to become immune earlier. Due to this selective immunization by natural infection, heterogeneous populations require less infections to cross their herd immunity threshold (HIT) than suggested by models that do not fully account for variation. We integrate continuous distributions of susceptibility or connectivity in otherwise basic epidemic models for COVID-19 which account for realistic intervention effects and show that as coefficients of variation (CV) increase from 0 to 5, HIT declines from over 60%^{3,4} to less than 10%. We then fit these models to series of daily new cases to estimate CV alongside basic reproduction numbers (*R*_{0}) and derive the corresponding HITs.

## Effects of individual variation on SARS-CoV-2 transmission

SARS-CoV-2 is transmitted primarily by respiratory droplets and modelled as a susceptible-exposed-infectious-recovered (SEIR) process.

### Variation in susceptibility to infection

Individual variation in susceptibility is integrated as a continuously distributed factor that multiplies the force of infection upon individuals^{5} as
where *S*(*x*) is the number of individuals with susceptibility *x, E*(*x*) and *I*(*x*) are the numbers of individuals who originally had susceptibility *x* and became exposed and infectious, while *R*(*x*) counts those who have recovered and have their susceptibility reduced to a reinfection factor *σ* due to acquired immunity. *δ* is the rate of progression from exposed to infectious, *γ* is the rate of recovery or death, *ϕ* is the proportion of individuals who die as a result of infection and *λ*(*x*) = (*β*/*N*) ∫[*ρE*(*y*) + *I*(*y*)] *dy* is the average force of infection upon susceptible individuals in a population of size *N* and transmission coefficient *β*. Standardizing so that susceptibility distributions have mean , given a probability density function *g*(*x*), the basic reproduction number is
where *ρ* is a factor measuring the infectivity of individuals in compartment *E* in relation to those in *I*. The coefficient of variation in individual susceptibility is explored as a parameter. Non-pharmaceutical interventions (NPIs) designed to control transmission typically reduce *β* and hence *R*_{0}. We denote the resulting controlled reproduction number by *R*_{c}. The effective reproduction number *R*_{eff} is another useful indicator obtained by multiplying *R*_{c} by the susceptibility of the population, in this case written as *R*_{eff} (*t*) = *R*_{c}(*t*) ∫ *xS*(*x, t*) *dx* to emphasize its time dependence.

*F*igure 1 depicts model trajectories fitted to suppressed epidemics (orange) in 4 European countries (Belgium, England, Portugal and Spain) assuming gamma distributed susceptibility and no reinfection (*σ* = *0*). We estimate: *R*_{0} rounding 5 (Belgium), 2.9 (England), 4.3 (Portugal) and 4.1 (Spain); individual susceptibility CV reaching 3.9 (Belgium), 1.9 (England), 4.3 (Portugal) and 3.2 (Spain); and overall intervention efficacy at maximum (typically during lockdown) being 60% (Belgium), 48% (England), 69% (Portugal) and 63% (Spain). Another estimated parameter is the day when NPIs begin to affect transmission, after which we assume a linear intensification from baseline over 21 days, remaining at maximum intensity for 30 days and linearly lifting back to baseline over a period of 120 days (although we have confirmed that the results do not change significantly if measures are lifted over slightly longer time frames, such as 150 or 180 days). Denoting by *d*(*t*) the proportional reduction in average risk of infection due to interventions, in this case we obtain *R*_{c}(*t*) = [*1* − *d*(*t*)]*R*_{0} which is depicted for each country, alongside *R*_{eff}(*t*), underneath the respective epidemic trajectories. Overlaid on the *R*_{c} plots are mobility data from Google^{6}, showing excellent agreement with our independently chosen framework and estimate for the time *R*_{eff} starts declining. To assess the potential for case numbers to overshoot if NPIs had not been applied, we rerun the model with *d*(*t*) = *0* and obtain the unmitigated epidemics (black). Further details and sensitivity analyses are described in Methods.

### Variation in connectivity

In a directly transmitted infectious disease, such as COVID-19, variation in exposure to infection is primarily governed by patterns of connectivity among individuals. We incorporate this in the system (Equations 1–4) assuming that individuals mix at random (but see Methods for more general formulations that enable other mixing patterns). Under random mixing and heterogeneous connectivity, the force of infection^{7} is written as *λ*(*x*) = (*β*/*N*)(∫ *y*[*ρE*(*y*) + *I*(*y*)] *dy*/∫ *yg*(*y*) *dy*), the basic reproduction number is

*R*_{c}(*t*) is as above and *R*_{eff}(*t*) is derived by a more general expression given in Methods. Applying this model to the same epidemics as before we estimate: *R*_{0} rounding 7.1 (Belgium), 3.8 (England), 7.9 (Portugal) and 6.6 (Spain); individual susceptibility CV reaching 2.9 (Belgium), 1.6 (England), 4.0 (Portugal) and 2.7 (Spain); and intervention efficacy during lockdown being 73% (Belgium), 58% (England), 80% (Portugal) and 72% (Spain).

Comparing the two models, variation in connectivity systematically leads to estimates that are higher for *R*_{0}, lower for CV, and higher for the efficacy of non-pharmaceutical interventions. Nevertheless, the percentage of the population required to be immune to curb the epidemic and prevent future waves when interventions are lifted appears remarkably conserved across models: 9.6 vs 11% (Belgium); 20 vs 21% (England); 7.3 vs 6.0% (Portugal); and 12 vs 11% (Spain). This property is further explored below.

## Herd immunity thresholds and their conserveness across models

Individual variation in risk of acquiring infection is under selection by the force of infection, whether individual differences are due to biological susceptibility, exposure, or both. The most susceptible or exposed individuals are selectively removed from the susceptible pool as they become infected and eventually recover (some die), resulting in decelerated epidemic growth and accelerated induction of immunity in the population. In essence, the *herd immunity threshold* defines the percentage of the population that needs to be immune to reverse epidemic growth and prevent future waves. When individual susceptibility or connectivity is gamma-distributed and mixing is random, HIT curves can be derived analytically^{8} from the model systems (Equations 1–4, with the respective forces of infections). In the case of variation in susceptibility to infection we obtain
while variable connectivity results in

In more complex cases HIT curves can be approximated numerically. Figure 3 shows the expected downward trends in HIT and the sizes of the respective unmitigated epidemics for SARS-CoV-2 without reinfection (*σ* = *0*) as the coefficients of variation are increased (gamma distribution shapes adopted here are illustrated in Extended Data Figure 1; for robustness of the trends to other distributions see Gomes et al^{9}). Values of *R*_{0} and CV estimated for our study countries are overlaid to mark the respective HIT and final epidemic sizes. While herd immunity is expected to require 60-80% of a homogeneous population to have been infected, at the cost of infecting almost the entire population if left unmitigated, given an *R*_{0} between 2.5 and 5, these percentages drop to the range 10-20% or lower when CV is roughly between 2 and 5.

When acquired immunity is not 100% effective (*σ* > *0*) HITs are relatively higher (Extended Data Figure 2). However, there is an upper bound for how much it is reasonable to increase *σ* before the system enters a qualitatively different regime. Above *σ* = *1/R*_{0} – the *reinfection threshold*^{10,11}– infection becomes stably endemic and the HIT concept no longer applies. Respiratory viruses are typically associated with epidemic dynamics below the reinfection threshold, characterized by seasonal epidemics intertwined with periods of undetection.

Individual variation in exposure, in contrast with susceptibility, accrues from complex patterns of human behaviour which have been simplified in our model. To explore the scope of our results we generalise our models (Methods) by relaxing some key assumptions. First, we enable mixing to be assortative in the sense that individuals contact predominantly with those of similar connectivity. Formally, an individual with connectivity *x*, rather than being exposed uniformly to individuals of all connectivities *y*, has contact preferences described by a normal distribution on the difference *y* − *x*. We find this modification to have negligible effect on HIT (Extended Data Figure 3). Second, we allow connectivity distributions to change in shape (not only scale) when subject to social distancing. In particular we modify the model so that CV reduces in proportion to the intensity of social distancing (Extended Data Figure 4) and replicate the fittings to epidemics in our study countries (Extended Data Figure 5). We find a general tendency for this model to estimate higher values for *R*_{0} and CV while HIT remains again remarkably robust to the change in model assumptions.

## Herd immunity thresholds and seroprevalenve at sub-national levels

As countries are conducting immunological surveys to assess the extent of exposure to SARS-CoV-2 in populations it is of practical importance to understand how HIT may vary across regions. We have redesigned our analyses to address this question. Series of daily new cases were stratified by region. Fitting the models simultaneously to the multiple series enabled the estimation of local parameters (*R*_{0} and CV) while the effects of NPIs were estimated at country level. Extended Data Figures 6–9 show how the modelled epidemics fit the regional data and include an additional metric to describe the cumulative infected percentage. These model projections are comparable to data from seroprevalence studies such as Spain^{12}. We emphasise that seroprevalence estimates generally lie slightly below our cumulative infection curves (Extended Data Figure 9) consistently with recent findings that a substantial fraction of infected individual does not exhibit detectable antibodies^{13}. In addition to their practical utility these results begin to unpack some of the variation in HIT within countries: Belgium (9.4-11%), England (16-26%), Portugal (7.1-9.9%) and Spain (7.5-21%).

## Discussion

The concept of *herd immunity* was developed in the context of vaccination programs^{14,15}. Defining the percentage of the population that must be immune to cause infection incidences to decline, HITs constitute useful targets for vaccination coverage. In idealized scenarios of vaccines delivered at random and individuals mixing at random, HITs are given by a simple formula (*1* − *1/R*_{0}) which, in the case of SARS-CoV-2, suggests that 60-80% of randomly chosen subjects of the population would need be immunized to halt spread considering estimates of *R*_{0} between 2.5 and 5. This formula does not apply to infection-induced immunity because natural infection does not occur at random. Individuals who are more susceptible or more exposed are more prone to be infected and become immune, providing greater community protection than random vaccination^{16}. In our model, the HIT declines sharply when coefficients of variation increase from 0 to 2 and remains below 20% for more variable populations. The magnitude of the decline depends on what property is heterogeneous and how it is distributed among individuals, but the downward trend is robust as long as susceptibility or exposure to infection are variable (Figure 3 and Extended Data Figures 3) and acquired immunity is efficacious enough to keep transmission below the reinfection threshold (Extended Data Figure 2).

Several candidate vaccines against SARS-CoV-2 are showing promising safety and immunogenicity in early-phase clinical trials^{17,18}, although it is not yet known how this will translate into effective protection. We note that the reinfection threshold^{10,11} informs not only the requirements on naturally acquired immunity but, similarly, it sets a target for how efficacious a vaccine needs to be in order to effectively interrupt transmission. Specifically, given an estimated value of *R*_{0} we should aim for a vaccine efficacy of *1* − *1/R*_{0} (60% or 80% if *R*_{0} is 2.5 or 5, respectively). A vaccine whose efficacy is insufficient to bring the system below the reinfection threshold will not interrupt transmission.

Heterogeneity in the transmission of respiratory infections has traditionally focused on variation in exposure summarized into age-structured contact matrices. Besides overlooking differences in susceptibility given exposure, the aggregation of individuals into age groups reduces coefficients of variation. We calculated CV for the landmark POLYMOD matrices^{19,20} and obtained values between *0*.*3* and *0*.*5*. Recent studies of COVID-19 integrated contact matrices with age-specific susceptibility to infection (structured in three levels)^{21} or with social activity (three levels also)^{22} which, again, resulted in coefficients of variation less than unity. We show that models with coefficients of variation of this magnitude would appear to differ only moderately from homogeneous approximations when compared with our estimates, which are consistently above 1 in England and above 2 in Belgium, Portugal and Spain. In contrast with reductionistic procedures that aim to reconstruct variation from correlate markers left on individuals (such as antibody or reactive T cells for susceptibility, or contact frequencies for exposure), we have embarked on a holistic approach designed to infer the whole extent of individual variation from the imprint it leaves on epidemic trajectories. Our estimates are therefore expected to be higher and should ultimately be confronted with more direct measurements as these become available. Adam at et^{23} conducted a contact tracing study in Hong Kong and estimated a coefficient of variation of 2.5 for the number of secondary infections caused by individuals, attributing 80% of transmission to 20% of cases. This statistical dispersion has been interpreted as reflecting a common pattern of contact heterogeneity which has been corroborated by studies that specifically measure mobility^{24}. According to our inferences, 20% of individuals may be responsible for 47-94% infections depending on model and country. In parallel, there is accumulating evidence of individual variation in the immune system’s ability to control SARS-CoV-2 infection following exposure^{25,26}. While our inferences serve their purpose of improving accuracy in model predictions, diverse studies such as these are necessary for developing interventions targeting individuals who may be at higher risk of being infected and propagating infection in the community.

Country-level estimates of *R*_{0} reported here are in the range 3-5 when individual variation in susceptibility is factored and 4-8 when accounting for variation in connectivity. The homogeneous version of our models would have estimated *R*_{0} between 2.4 and 3.3, in line with other studies^{27}. Estimates for England suggest lower baseline *R*_{0} and lower CV in comparison with the other study countries (Belgium, Portugal and Spain). The net effect is a slightly higher HIT in England which nevertheless we estimate around 20%. The lowest HIT, at less than 10%, is estimated in Portugal, with higher *R*_{0} and higher CV. NPIs reveal less impact under variable susceptibility (48-69%), followed by variable connectivity (58-80%), and finally appear to inflate and agree with Flaxman et al^{27} when homogeneity assumptions are made (65-89%), although this does not affect the HIT which relates to pre-pandemic societies.

More informative than reading these numbers, however, is to look at simulated projections for daily new cases over future months (Figures 1 and 2). In all four countries considered here we foresee HIT being achieved between July and October and the COVID-19 epidemic being mostly resolved by the end of 2020. Looking back, we conclude that NPIs had a crucial role in halting the growth of the initial wave between February and April. Although the most extreme lockdown strategies may not be sustainable for longer than a month or two, they proved effective at preventing overshoot, keeping cases within health system capacities, and may have done so without impairing the development of herd immunity.

## Data Availability

Datasets are publicly available at the respective national ministry of health websites.

## METHODS

### Model structure and underlying assumptions

The model presented here is a differential equation SEIR model, where susceptible individuals become exposed at a rate that depends on their susceptibility, the number of potentially infectious contacts they engage in, and the total number of infectious people in the population per time unit. Upon exposure, individuals enter an asymptomatic incubation phase, during which they slowly become infectious^{29–32}. Thus, infectivity of exposed individuals is made to be *1/2* of that of infectious ones (*ρ* = *0*.*5*). After a few days, individuals develop symptoms – on average 4 days after the exposure to the virus (*δ* = *1/4*) – and thus become fully infectious^{33–35}. They recover, i.e., they are no longer infectious 4 days after that (*γ* = *1/4*), on average^{36}.

### Efficacy of acquired immunity

We conducted the core of our analysis under the assumption that no reinfection occurs after recovery due to acquired immunity (*σ* = *0*). To analyse the sensitivity of these results to leakage in immune response (*σ* > *0*) we calculated herd immunity thresholds (HIT) as a function of coefficients of variation (CV) for different values of *σ*. The results displayed in Extended Data Figure 2 confirm the expectation that as the efficacy of acquired immunity decreases (*σ* increases) larger percentages of the population are infected before herd immunity is reached. Less intuitive is that there is an upper bound for how much it is reasonable to increase *σ* before the system enters a qualitatively different regime – the reinfection threshold^{10–11} (*σ* = *1/R*_{0})– above which infection becomes stably endemic and the notion of herd immunity threshold no longer applies. Respiratory viruses are typically associated with epidemics dynamics below the reinfection threshold.

### Effective reproduction number

The effective reproduction number (*R*_{-‥}, also denoted by *R*_{-} or *R*_{3} by other authors) is a time-dependent quantity which we calculate as the incidence of new infections divided by the total number of active infections (affected by *ρ* for individuals in *E*) multiplied by the average duration of infection (also affected by *ρ* for individuals in *E*)

### Assortative mixing

In the main text we assumed random mixing among individuals, but human connectivity patterns are assortative due societal structures and human behaviours. To explore the sensitivity of our results to deviations from random mixing, we develop an extended formalism that allows individuals to connect preferentially with those with similar connectivity, formally *λ*(*x*) = (*β*/*N*)(∫ *y h*(*y* − *x*)[*ρE*(*y*) + *I*(*y*)] *dy*/∫ *yg*(*y*) *dy*), where *h*(*y* − *x*) is a normal distribution on the difference between connectivity factors (Extended Data Figure 3).

### Dynamic coefficients of variation

The formulation of the variable connectivity model in the main text assumes that coefficients of variation are constant irrespective of interventions. Social distancing has been assumed to reduce connectivity of every individual by the same factor (from *x* to [*1* − *d*]*x*) leaving the coefficient of variation unchanged. The possibility that CV might reduce with social distancing (*d*), causing a drop in the intensity of selection, might affect our results. To study sensitivity to this type of CV dynamics, we formulate an extended model where connectivity is reformulated as (*1* − *d*)[*1* + (*1* − *d*)(*x* − *1*)], and whose CV decreases with social distancing (Extended Data Figure 4). This does not change the way the model is written but special care is needed in analysis and interpretation to account for the new dynamics. The basic reproduction number, in particular, depends explicitly on a CV which is now dependent on social distancing
which is noticeable in the curvilinear shape of the controlled *R*_{0} (*R*_{c}) trajectories (Extended Data Figure 5).

### Non-pharmaceutical interventions

We implemented non-pharmaceutical interventions (NPI) as a gradual decrease in viral transmissibility in the population and thus a lowering of the controlled and effective reproduction numbers (*R*_{c} and *R*_{eff}). Once containment measures are put in place in each country, we postulate it takes 21 days until the maximum effectiveness of social distancing measures is reached. In the simulations presented throughout we have held this condition (maximum “lockdown” efficacy) for 30 days, after which period, social distancing measures are progressively relaxed, slowly returning to pre-pandemic conditions. Both the implementation and relaxing of the social distancing measures are imposed to be linear in this scheme.

### Bayesian Inference

The model laid out above is amenable to theoretical exploration as presented in the main manuscript and provides a perfect framework for inference. Fundamentally, to be able to reproduce the inception of any epidemic, we would need to estimate when local transmission started to occur (*t*_{0}), and the pace at which individuals infected each other in the very early stages of the epidemic (*R*_{0}). All countries, to different extents and at different timepoints of the epidemic, enforced some combination of social distancing measures. To fully understand the interplay between herd immunity and the impact of NPIs, we then set out to estimate the time at which social distancing measures started to have an impact on daily incidence , what their maximum effectiveness (*d*_{max}) is, the basic reproduction number (*R*_{0}) and what the underlying variance in heterogeneity is for both susceptibility to infection and number of infectious contacts.

In order to preserve identifiability, we made two simplifying assumptions: (*i*) the fraction of infectious individuals reported as COVID-19 cases (reporting fraction) is constant throughout the study period and is comparable between countries proportionally to the number of tests performed per person; (*ii*) local transmission starts (*t*_{0}) when countries/regions report 1 case per 5 million population in one day. To calculate the reporting rates, we used the Spanish national serological survey^{12} as a reference and divided the total number of reported cases up to May 11^{th} by the estimated number of people that had been exposed to the virus. This gives us a reporting rate for Spain around 6%. Unfortunately, there are no other national serological surveys that could inform the proportion of the population infected in other countries, so we had to extrapolate the reporting rate for those. Assuming the reporting rate is highly dependent on the testing effort employed in each country, reflected in the number of tests per individual, we estimate the reporting rate by scaling the reporting rate recorded in Spain according to the ratio of PCR tests per person in other countries relative to the Spanish reference of 0.9 tests per thousand people (https://ourworldindata.org/coronavirus-testing). This produced estimated case reporting rates (ratio of reported cases to infections) of 9% for Portugal, 6% for Belgium (and Spain) and 2.4% for England.

Whist national case and mortality data is easily available for most countries, more spatially resolute data is difficult to find in the public domain. Thus, we restricted our analysis to countries for which disaggregated regional case data was easily available. We collected the data at two time points. First, we compiled all available data from the day the countries started reporting COVID-19 cases to the initial collection date (May 20^{th}) and later collated available data from May 21^{st} to July 10^{th}.

Parameter estimation was performed with the software Matlab, using PESTO (Parameter EStimation Toolbox)^{37}, and assuming the reported case data can be accurately described by a Poisson process. We first fixed the beginning of local transmission (parameter *t*_{0}) in each data series as the day in which reported cases surpassed 1 in 5 million individuals. Next, we optimized the model for the set of parameters by maximizing the logarithm of the likelihood (*LL*) (Equation 11) of observing the daily reported number of cases in each country
in which *y*(*k, θ*) is the simulated model output number of COVID-19 cases at day *k* (with respect to *t*_{0}), and *n* is the total number of days included in the analysis for each country.

When fitting the model to disaggregated data, we follow the procedure outlined above and estimate region-specific *R*_{0} and *CV*, with common and *d*_{max}. To ensure that the estimated maximum is a global maximum, we performed 50 multi-starts optimizations, and selected the combination of parameters resulting in the maximal Loglikelihood as a starting point for *10*^{2} Markov Chain Monte-Carlo iterations. From the resulting posterior distributions, we extract the median estimates for each parameter and the respective 95% credible intervals for the set of parameters . We used uniformly distributed priors with ranges {1-9, 0.0025-8,1-60, 0-0.7}.

*T*his fitting procedure was applied to 4 countries (Belgium, England, Portugal and Spain) for both the national and disaggregated case data series and repeated for each of the 4 model variants considered here (homogeneous, heterogeneous susceptibility, heterogeneous connectivity with constant CV, and heterogeneous connectivity with CV reducing in proportion to social distancing). In the fitting procedures using sub-national data, we assumed regions had the same start date for interventions that mitigate transmission , and that these measures produced the same maximum impact on transmission (*d*_{max}) everywhere. Thus, the only region-specific parameters to be estimated are . Parameter estimates obtained from each of the model variants are displayed in Extended Data Table 1 (heterogeneity in susceptibility), Extended Data Table 2 (heterogeneity in connectivity with constant CV), Extended Data Table 3 (heterogeneity in connectivity with dynamic CV) and Extended Data Table 4 (homogeneous model), are comparable to those obtained in other studies^{27,38–43}. Finally, we apply the Akaike information criterion (AIC) for each estimation procedure to inform on the quality of each model’s fit to the datasets of reported cases (Extended Data Table 5). In all cases, heterogeneous models are preferred over the homogeneous approximation. Homogeneous models systematically fail to fit the maintenance of low numbers of cases after the relaxation of social distancing measures in many countries and regions (images not shown). The three heterogeneous models are roughly equally well supported by the data used in this study. Further research should complement this with discriminatory data types and hybrid models to enable the integration of different forms of individual variation.

### Data availability

Datasets are publicly available at the respective national ministry of health websites (44-48).

## Author contributions

M.G.M.G. conceived the study. R.A. and R.M.C. and M.G.M.G. performed the analyses. All authors interpreted the data and wrote the paper.

## Competing interests

The authors declare no competing interests.

## Acknowledgements

We thank Jan Hasenauer and Antonio Montalbán for helpful discussions concerning statistical inference and mathematics, respectively. R.M.C. and M.U.F. receive scholarships from the Conselho Nacional de Desenvolvimento Científico e Tecnológio (CNPq), Brazil.