Abstract
Serological studies are critical for understanding pathogen-specific immune responses and informing public health measures1,2. Here, we evaluate tandem IgM, IgG and IgA responses in a cohort of individuals PCR+ for SARS-CoV-2 RNA (n=105) representing different categories of disease severity, including mild and asymptomatic infections. All PCR+ individuals surveyed were IgG-positive against the virus spike (S) glycoprotein. Elevated Ab levels were associated with hospitalization, with IgA titers, increased circulating IL-6 and strong neutralizing responses indicative of intensive care status. Additional studies of healthy blood donors (n=1,000) and pregnant women (n=900), sampled weekly during the initial outbreak in Stockholm, Sweden (weeks 14-25, 2020), demonstrated that anti-viral IgG titers differed over 1,000-fold between seroconverters, highlighting the need for careful evaluation of assay cut-offs for individual measurements and accurate estimates of seroprevalence (SP). To provide a solution to this, we developed probabilistic machine learning approaches to assign likelihood of past infection without setting an assay cut-off, allowing for more quantitative individual and population-level Ab measures. Using these tools, that considered responses against both S and RBD, we report SARS-CoV-2 S-specific IgG in 6.8% of blood donors and pregnant women two months after the peak of spring COVID-19 deaths, with the SP curve and country death rate following similar trajectories.
Introduction
As SARS-CoV-2 only recently crossed the species barrier3, the characterization of the humoral response to nascent outbreaks is central to optimizing approaches to tackle COVID-19 and further our understanding of human immunology4-6.
Numerous SARS-CoV-2 studies have reported seroprevalence (SP) and disease-associated antibody (Ab) phenotypes7-9, isolated virus-neutralizing monoclonal antibodies10-13 and used convalescent individuals to define metrics for plasma therapy14. However, consensus on several key issues remains outstanding. For instance, the majority of serology data are derived from commercial kits utilizing spike derivatives or the nucleocapsid to detect pathogen-specific antibodies7,15,16. Several of these assays suffer from epitope loss17, cross-reactivity7,18 and suboptimal sensitivity19-22. Here, we develop highly sensitive and specific ELISA protocols based on native-like prefusion-stabilized spike (S) trimers23 and the receptor-binding domain (RBD) produced in house, to accurately evaluate anti-viral Ab levels in COVID-19 patients and key community groups. We carefully analysed individuals with low anti-viral IgG responses and examined anti-viral Ab levels alongside in vitro virus neutralisation capacity and a descriptive set of clinical features to identify disease-associate Ab phenotypes.
Aside from the target antigen, a major consideration for Ab testing concerns setting the cut-off for positivity24-26, which significantly affects seroprevalence (SP) estimates and individual clinical management. Currently employed approaches to the problem are severely limited by their high dependency on the representative nature of negative controls and the dichotomization of a continuous variable27,28; i.e. a sample is either positive or negative, but in reality, a wide spectrum of responses exists within the population. Therefore, to obtain more accurate SP estimates in blood donors and pregnant women, we strictly controlled our assay by the repeat sampling of historical controls (i.e. SARS-CoV-2 negative, n=295), and used data from SARS-CoV-2 PCR+ individuals to train probabilistic algorithms to handle ELISA measurements. We sampled blood donors (BDs) and pregnant women (PW) weekly throughout the first outbreak (late March-late June 2020). Blood donors are an important clinical resource, including for COVID-19 plasma therapy, while pregnant women require close clinical monitoring with respect to fetal-maternal health and are known to employ unique, yet poorly characterized immunological mechanisms that impact infectious pathology29-31.
As Sweden did not impose a strict lockdown in response to the pandemic and has reported relatively high per-capita morbidity and mortality (ECDC, EU), understanding SARS-CoV-2 seroprevalence in these cohorts helps plan clinical need and understand the development of immunity in the population.
Results
Antibody test development
We developed ELISA protocols to profile IgM, IgG and IgA specific for a stabilized spike (S) glycoprotein trimer23, its RBD, and the nucleocapsid (N). The two S antigens were produced in mammalian cells in-house (Fig. S1A) and trimer conformation was confirmed by cryo-EM32. A representative subset of the study samples (Table 1) was used for assay development (Fig. S1B). No reactivity was recorded amongst the negative control samples during test development, except for two endemic coronavirus-positive individuals that displayed reproducible IgM reactivity to SARS-CoV-2 N and S, and two 2019 blood donors with low anti-S IgM reactivity (Fig. S1C). Further investigation is, therefore, required to establish whether cross-reactive memory or primordial B lymphocyte lineages contribute to SARS-CoV-2 responses a priori in a subset of individuals.
Our assay revealed over a 1,000-fold difference in anti-viral IgG titers between Ab-positive individuals when examining serially diluted sera. In SARS CoV-2 PCR+ individuals, anti-viral IgG titers were comparable for S (EC50=3,064; 95% CI [1,197 – 3,626]) and N (EC50=2,945; 95% CI [543 – 3,936]) and lower for RBD [EC50=1,751; 95% CI 966 – 1,595]. A subset (ca. 10%) of the SARS CoV-2-confirmed individuals did not have detectable IgG responses against N (Fig. S1D), suggesting that tests using this antigen may miss an important fraction of Ab- positive individuals, as seen previously7.
Elevated anti-viral Ab responses are associated with disease severity
We next screened all SARS-CoV-2 PCR+ individuals in the study (n=105) and detected potent IgG responses against S in all individuals and against RBD in 97% of the persons (Fig. 1A and S2A). In healthy blood donors and pregnant women, titers were generally lower (Fig. 1A and S2A) and varied greatly, with some persons displaying low but likely real titers similar to those observed at the higher end of the negative control range. The need to identify low titer individuals, which comprise a relatively large fraction of seropositive healthy individuals, highlighted the challenge of using a pre-determined assay cut-off.
The IgM and IgA responses against S and RBD were generally less potent and more variable between individuals than the IgG response (Fig. 1B and S2B). Therefore, we sought to investigate whether isotype responses segregated with clinical features. To do this, the SARS-CoV-2 PCR+ individuals were classified according to clinical disease severity: Category 1 – non-hospitalized (mild and asymptomatic); Category 2 – hospitalized; Category 3 – intensive care (on mechanical ventilation) (Table 1).
In all PCR+ individuals, anti-S and anti-RBD responses were highly correlated (Fig. S2C). Furthermore, multivariate analyses revealed increased anti-viral IgM, IgG and IgA to be associated with disease severity (Fig 1C and S Table 1), in line with the lower titers observed in blood donors and pregnant women. This was most pronounced for pathogen-specific IgA, suggestive of advancing mucosal disease33. A more severe clinical picture was also strongly associated with elevated serum IL-6 (Fig. 1C), a cytokine that feeds Ab production34-37. IL-6 is dysregulated in polygenic metabolic diseases38-40 and acute respiratory distress syndrome (ARDS)41, risk factors for COVID-19-associated mortality42,43. Interestingly, we observed a lack of association between serum IL-6 and anti-viral IgM levels, supporting that levels of the cytokine and IgA mark a protracted, more severe clinical course. Female IgA RBD responses were lower in non-hospitalized and hospitalized groups, compared to males (Fig. S3A)44.
In our studies, anti-viral IgG levels were maintained two months post-disease onset/positive PCR test, while IgM and IgA decreased (Fig. S3B), in agreement with their circulating t1/2. Serum from SARS-CoV-2 PCR+ individuals was collected 6-61 days post-PCR, with the median time from symptom onset to PCR test being 5 days. In longitudinal patient samples where we observed seroconversion, IgM, IgG and IgA peaked with similar kinetics when all three isotypes developed, although anti-viral IgA was not always generated in Category 1 and 2 individuals (Fig 1D and S3C). Anti-viral IgM could be maintained two months post- PCR+/symptom onset at the individual level (Fig S3C). Overall, disease severity showed the most consistent relationship with any measure and was the primary predictor of Ab levels (Fig. S3D). After accounting for the effect of age, sex and days since PCR+ test, anti-viral IgA titers were approximately 3-fold higher in intensive care vs. non-hospitalized individuals, while IL-6 levels were ca. 10-fold increased (Fig. S3E).
We next characterized virus neutralizing Ab response, a key parameter for understanding the potential for protective humoral immune responses and the selection of plasma therapy donors. Using an in vitro pseudotype virus assay to measure block of viral entry into target cells, we detected neutralizing antibodies in the serum of all SARS-CoV-2 PCR+ individuals, and in all except two healthy Ab-positive donors (from n=56). Neutralizing responses were not seen in samples before seroconversion. A range of neutralizing ID50 titers were apparent (Fig 1E), with binding and neutralization titers being highly correlated (Fig. S3D). The strongest neutralizing responses were observed in samples from patients in intensive care (Category 3) (geometric mean ID50=5,058; 95% CI [2,422 – 10,564]), in keeping with their elevated and presumably more developed Ab response (Fig 1F). Sera from healthy blood donors and pregnant women also displayed neutralizing responses, but consistent with the binding data, these were less potent than those observed in individuals with severe disease (ID50=600; 95% CI [357 – 1,010] and ID50=350; 95% CI [228 – 538], respectively, Fig. 1G). Across the two antigens and three isotypes, anti-RBD IgG was most strongly correlated with neutralization.
Seroprevalence estimates in blood donors and pregnant women
As the Stockholm region is a busy urban area and Sweden did not impose strict lockdown in response to the emergence of SARS-CoV-2, we sought to better understand the frequency and nature of anti-SARS-CoV-2 Ab responses in healthy blood donors and pregnant women sampled during weeks 14-25 (March 30 to June 22 2020) (Fig. 2A). We surveyed a large number of historical controls, blood donors from the spring of 2019 (n=295), to inform decisions about how to set the assay cut-off for positivity. We considered this critical since the use of a small set of control samples can lead to an incorrectly set threshold and errors in the measurement. To illustrate this, we randomly sampled six groups of 20 negative controls from 890 measurements in 295 historical donors and we calculated SP in blood donors and pregnant women from weeks 17-19 based on a 6SD cut-off. Depending on which 20 negative control samples were used from the pool, the SP ranged from 5.7 to 8.7%, a 35% difference (Fig. 2B). Weak responder status is likely influenced by many factors, including genetic background, health status, total serum Ab and protein levels, and assay variability. Critically, test samples with low anti-viral titers may also fall into this range, highlighting the need to better understand the assay boundary. Taking a one-dimensional 6SD cut-off for anti-S IgG responses based on all 890 values from 295 negative donors, 8% of healthy donors sampled in Stockholm tested positive two months after the peak of deaths in the country (Fig. 2C and S4A and B).
To exploit individual antibody titers and improve our estimates, using an independent approach we modelled the probability that a sample is IgG positive using machine learning approaches that considered anti-S and -RBD responses together. After comparing different algorithms for this purpose, we generated an ensemble SVM-LDA learner (ENS) to maximize sensitivity, specificity and consistency across different cross-validation strategies and trained it using our data from PCR+ individuals (please see: Materials and Methods). The ENS learner identified 6.5 and 6.6% IgG positivity in blood donors and pregnant women, respectively, two months after the peak of outbreak deaths (S Table 2). Importantly, ENS identified several individuals to have a 40-50% chance of being antibody positive, who could be targeted for further testing (Fig. 2D). The accuracy of these tools will improve with larger training sets and will facilitate future assays in providing multiplex antigen measurements in a single test, as well as information on the strength of response.
Finally, to model population changes in SP over time, we developed and validated a cut-off- independent Bayesian framework able to share information between weeks45 (Fig. S4C and Materials and Methods). Using this model, we found the steep increase in positivity at the start of the pandemic to slow between weeks 17-25, suggesting that humoral immunity to the virus developed slowly in these healthy cohorts despite considerable virus spread in the community. Strikingly, seroprevalence inferred using our Bayesian approach exhibited the same trajectory as Stockholm County deaths when lagged by two and half weeks (Fig. S4D).
Discussion
Serology remains the gold standard for estimating previous exposure to pathogens and benefits from a large historical literature46. It is a strong predictor of an anamnestic response and quicker recovery upon re-infection. Although the concept of herd immunity is based upon the study of antibodies, worryingly, there is no standardization for the many SARS-CoV-2 antibody tests currently available. Globally, hospital staff and health authorities are struggling with test choices, negatively impacting individual outcomes and efforts to contain SARS-CoV-2. Benefitting from a robust antibody test developed alongside a diagnostic clinical laboratory responsible for monitoring sero-reactivity during the pandemic, we profiled SARS-CoV-2 Ab responses in three cohorts of clinical interest. COVID-19 patients receiving intensive care showed the highest anti-viral titers, developing augmented serum IgA and IL-6 with worsening disease, likely a consequence of age, sex and other risk factors, as well as prolonged immune stimulation and a higher infectious dose. Therefore, isotype-level measures may assist COVID- 19 clinical management and determine, for example, whether all critically ill patients develop class-switched mucosal responses to SARS-CoV-2, potentially informing lung therapeutic delivery47,48. Our neutralization data showed that nearly all SARS-CoV-2 PCR+ individuals and healthy donors who seroconvert, develop Ab titers capable of preventing S-mediated cell entry in vitro, demonstrating the critical role of Abs to fight SARS-CoV-2.
Outside of the severe disease setting, it is important to determine how many people have seroconverted. Blood donors and pregnant women are both good sentinels for population health, although they are not enriched for high-risk groups, such as healthcare workers and public transportation employees, where SP may be higher. Blood donors are generally working age, active and mobile members of society with a good understanding of health, and the majority of pregnant women in Sweden would have been advised to take precautions against infectious diseases. Interestingly, in our study, both groups showed a similar SP during the time period analyzed.
By tracking these cohorts over time, we modelled SP changes at the population level with two independent tools we make freely available. We found the steep climb in Ab positivity at the start of the pandemic, to increase at a slower rate during subsequent weeks, plateauing at ca. 7%, well below the levels required for herd immunity. We noted that when lagged by two and half weeks, the trajectory of the SP curve coincided with the decreasing caseload and number of fatalities in Sweden49. Indeed, ICU occupancy and deaths are likely a better proxy for viral spread than PCR+ diagnoses, which are highly dependent on the number of tests carried out. Given the uniqueness of the approach in Sweden50, the data we present here may inform the management of this and future pandemics especially given the high inter-individual variability in anti-viral responses observed against SARS CoV-2.
Materials and methods
Human samples and ethical declaration
Samples from PCR+ individuals and admitted COVID-19 patients (n=105) were collected by the attending clinicians and processed through the Departments of Medicine and Clinical Microbiology at the Karolinska University Hospital. Samples were handled and analyzed in accordance with approval by the Swedish Ethical Review Authority (registration no. 2020-02811). All personal identifiers were pseudo-anonymized, and clinical feature data were blinded to the researchers carrying out experiments until data generation was complete. PCR testing for SARS-CoV-2 RNA was by nasopharyngeal swab or upper respiratory tract sampling at Karolinska University Hospital. As viral RNA CT values were determined using different qPCR platforms between patients, we did not analyze these alongside other available features. PCR+ individuals were questioned about the date of symptom onset at their initial consultation and followed-up for serology during their care, up to 2 months post-diagnosis. In addition, longitudinal samples from 10 of these patients were collected to monitor seroconversion and isotype persistence.
Anonymized samples from blood donors (100/week) and pregnant women (100/week) were randomly selected and obtained from the department of Clinical Microbiology, Karolinska University Hospital. No metadata, such as age or sex information were available for these samples. Pregnant women were sampled as part of routine for infectious diseases screening during the first trimester of pregnancy. Blood donors (n=295) collected through the same channels a year previously were randomly selected for use as negative controls. Serum samples from individuals testing PCR+ for endemic coronaviruses, 229E, HKU1, NL63, OC43 (n=20, ECV+) in the prior 2-6 months, were used as additional negative controls. The use of these anonymized samples was approved by the Swedish Ethical Review Authority (registration no. 2020-01807). Stockholm County death and Swedish mortality data was sourced from the ECDC and the Swedish Public Health Agency, respectively. Study samples are defined in Table 1.
Serum sample processing
Blood samples were collected by the attending clinical team and serum isolated by the department of Clinical Microbiology. Samples were barcoded and stored at −20°C until use. Serum samples were not heat-inactivated for ELISA protocols but were heat-inactivated at 56°C for 60 min for neutralization experiments.
SARS-CoV-2 antigen generation
The plasmid for expression of the SARS-CoV-2 prefusion-stabilized spike ectodomain with a C-terminal T4 fibritin trimerization motif was obtained from23. The plasmid was used to transiently transfect FreeStyle 293F cells using FreeStyle MAX reagent (Thermo Fisher Scientific). The ectodomain was purified from filtered supernatant on Streptactin XT resin (IBA Lifesciences), followed by size-exclusion chromatography on a Superdex 200 in 5 mM Tris pH 8, 200 mM NaCl.
The RBD domain (RVQ – QFG) of SARS-CoV-2 was cloned upstream of a Sortase A recognition site (LPETG) and a 6xHIS tag, and expressed in 293F cells as described above. RBD-HIS was purified from filtered supernatant on His-Pur Ni-NTA resin (Thermo Fisher Scientific), followed by size-exclusion chromatography on a Superdex 200. The nucleocapsid was purchased from Sino Biological.
Anti-SARS-CoV2 ELISA
96-well ELISA plates (Nunc MaxiSorp) were coated with SARS-CoV-2 S, RBD or nucleocapsid (100 μl of 1 ng/μl) in PBS overnight at 4°C. Plates were washed six times with 300 ml PBS-Tween-20 (0.05%) and blocked using PBS-5% no-fat milk (Sigma). Human serum samples were thawed at room temperature, diluted (1:100 unless otherwise indicated), vortexed and incubated in blocking buffer for 1h before plating. Serum samples were incubated overnight at 4°C before washing, as before. Secondary HRP-conjugated anti-human antibodies were diluted in blocking buffer and incubated with samples for 1 hour at room temperature. Plates were washed a final time before development with TMB Stabilized Chromogen (Invitrogen). The reaction was stopped using 1M sulphuric acid and OD values were measured at 450 nm using an Asys Expert 96 ELISA reader (Biochrom Ltd.). Secondary antibodies (all from Southern Biotech) and dilutions used: goat anti-human IgG (2014-05) at 1:10,000; goat anti-human IgM (2020-05) at 1:1000; goat anti-human IgA (2050-05) at 1:6,000. All assays of the same antigen and isotype were developed for their fixed time and samples were randomized and run together on the same day when comparing binding between PCR+ individuals. All data were log transformed for statistical analyses.
In vitro virus neutralisation assay
Pseudotyped viruses were generated by the co-transfection of HEK293T cells with plasmids encoding the SARS-CoV-2 spike protein harboring an 18 amino acid truncation of the cytoplasmic tail23; a plasmid encoding firefly luciferase; a lentiviral packaging plasmid (Addgene 8455) using Lipofectamine 3000 (Invitrogen). Media was changed 12-16 hours post-transfection and pseudotyped viruses harvested at 48- and 72-hours, filtered through a 0.45 μm filter and stored at -80°C until use. Pseudotyped neutralisation assays were adapted from protocols validated to characterize the neutralization of HIV, but with the use of HEK293T- ACE2 cells. Briefly, pseudotyped viruses sufficient to generate ~100,000 RLUs were incubated with serial dilutions of heat-inactivated serum for 60 min at 37°C. Approximately 15,000 HEK293T-ACE2 cells were then added to each well and the plates incubated at 37°C for 48 hours. Luminescence was measured using Bright-Glo (Promega) according to the manufacturer’s instructions on a GM-2000 luminometer (Promega) with an integration time of 0. 3s. The limit of detection was at a 1:45 serum dilution.
IL-6 cytometric bead array
Serum IL-6 levels were measured in a subset of PCR+ serum samples (n=64) using an enhanced sensitivity cytometric bead array against human IL-6 from BD Biosciences (Cat # 561512). Protocols were carried out according to the manufacturer’s recommendations and data acquired using a BD Celesta flow cytometer.
Statistical analysis of SARS-CoV-2 PCR+ data
All univariate comparisons were performed using non-parametric analyses (Kruskal-Wallis, stratified Mann-Whitney, hypergeometric exact tests and Spearman rank correlation), as indicated, while multivariate comparisons were performed using linear regression of log transformed measures and Wald tests. For multivariate tests, all biochemical measures (IL-6, PSV ID50 neut., IgG, IgA, IgM) were log transformed to improve the symmetry of the distribution. As “days since first symptom” and “days since PCR+ test” are highly correlated, we cannot include both in any single analysis. Instead, we show results for one, then the other (Supplementary Table 1).
Probabilistic seroprevalence estimations
We employed two distinct probabilistic strategies for estimating seroprevalence without thresholds, each developed independently. Our machine learning approach consisted of evaluating different algorithms suited to ELISA data, which we compared through ten-fold cross validation (CV): logistic regression, linear discriminant analysis (LDA), and support vector machines (SVM) with a linear kernel. Logistic regression and linear discriminant analysis both model log odds of a sample being case as a linear equation with a resulting linear decision boundary. The difference between the two methods is in how the coefficients for the linear models are estimated from the data. When applied to new data, the output of logistic regression and LDA is the probability of each new sample being a case. Support vector machines is an altogether different approach. We opted for a linear kernel, once again resulting in a linear boundary. SVM constructs a boundary that maximally separates the classes (i.e. the margin between the closest member of any class and the boundary is as wide as possible), hence points lying far away from their respective class boundaries do not play an important role in shaping it. SVM thus puts more weight on points closest to the class boundary, which in our case is far from being clear. Linear SVM has one tuning parameter C, a cost, with larger values resulting in narrower margins. We tuned C on a vector of values (0.001, 0.01, 0.5, 1, 2, 5,10) via an internal 5-fold CV with 5 repeats (with the winning parameter used for the final model for the main CV iteration). We also note that the natural output of SVM are class labels rather than class probabilities, so the latter are obtained via the method of Platt51. We considered three strategies for cross-validation: i) random: individuals were sampled into folds at random, ii) stratified: individuals were sampled into folds at random, subject to ensuring the balance of cases:controls remained fixed and iii) unbalanced: individuals were sampled into folds such that each fold was deliberately skewed to under or over-represent cases compared to the total sample. We sought a method that worked equally well across all cross-validation schemes, as the true proportion of cases in the test data is unknown and so a good method should not be overly sensitive to the proportion of cases in the training data. We found most methods worked well, although logistic regression was sensitive to changes in the case proportion in the training data. We chose to create an ensemble method combining that with the highest specificity (LDA) and the highest sensitivity (SVM), defined as an unweighted average of the probabilities generated under SVM and LDA. The ensemble learner had average sensitivity > 99.1% and average specificity 99.8%. We then trained the ensemble learner on all 719 training samples and predicted the probability of anti-SARS-CoV-2 antibodies in blood donors and pregnant volunteers sampled in 2020. We inferred the proportion of the sampled population with positive antibody status each week using multiple imputation. We repeatedly (1,000 times) imputed antibody status for each individual randomly according to the ensemble prediction, and then analyzed each of the 1,000 datasets in parallel, combining inference using Rubin’s rules, derived for the Wilson binomial proportion confidence interval52.
Our Bayesian approach is explained in detail in Christian et al45. Briefly, we used a logistic regression over anti-RBD and -S training data to model the relationship between the ELISA measurements and the probability that a sample is antibody-positive. We adjusted for the training data class proportions and used these adjusted probabilities to inform the seroprevalence estimates for each time point. Given that the population seroprevalence cannot increase dramatically from one week to the next, we constructed a prior over seroprevalence trajectories using a transformed Gaussian Process, and combined this with the individual class- balance adjusted infection probabilities for each donor to infer the posterior distribution over seroprevalence trajectories.
Data Availability
Data generated as part of the study, along with custom code for statistical analyses, is openly available via our GitHub repositories: https://github.com/MurrellGroup/DiscriminativeSeroprevalence/ and https://github.com/chr1swallace/seroprevalence-paper.
Data and code availability statement
Data generated as part of the study, along with custom code for statistical analyses, is openly available via our GitHub repositories: https://github.com/MurrellGroup/DiscriminativeSeroprevalence/ and https://github.com/chr1swallace/seroprevalence-paper.
Author contributions
GKH and XCD designed the study, analyzed the results and wrote the manuscript with input from co-authors. JA, TA, SM, GB and SA provided the study serum samples and clinical information. LH, LPV, AMM, DJS, KCI, BM and GM generated SARS-CoV-2 antigens and pseudotyped viruses. XCD and MF developed the ELISA protocols and XCD generated the data. DJS and BM developed and performed the neutralization assay. MCh and BM developed the Bayesian framework. CW and NFG assisted with patient data statistical analyses and executed machine learning approaches. MA, SK, PP, MM, JC, MCo and JR carried out wet lab experiments and assisted with data analysis.
Conflict of interest
The study authors declare no competing interests related to the work.
Acknowledgments
We would like to thank the study participants and attending clinical teams. Secondly, we extend our thanks to Björn Reinius, Marc Panas, Julian Stark, Remy M. Muts and Darío Solis Sayago for their input and discussion. Funding for this work was provided by a Distinguished Professor grant from the Swedish Research Council (agreement 2017-00968) and NIH (agreement SUM 1A44462-02). CW and NFG are funded by the Wellcome Trust (WT107881) and MRC (MC_UP_1302/5).