Antibody responses to SARS-CoV-2 train machine learning to assign likelihood of past infection during virus emergence in Sweden

3 Xaquín Castro Dopico, Leo Hanke, Daniel J. Sheward, Sandra Muschiol, Soo Aleman, 4 Murray Christian, Nastasiya F. Grinberg, Monika Adori, Laura Perez Vidakovics, ChangIl 5 Kim, Sharesta Khoenkhoen, Pradeepa Pushparaj, Ainhoa Moliner Morro, Marco 6 Mandolesi, Marcus Ahl, Mattias Forsell, Jonathan Coquet, Martin Corcoran, Joanna 7 Rorbach, Joakim Dillner, Gordana Bogdanovic, Gerald McInerney, Tobias Allander, 8 Chris Wallace, Ben Murrell, Jan Albert, Gunilla B. Karlsson Hedestam 9


5
Severe disease was most associated with virus-specific IgA, suggestive of mucosal disease 36 , 144 as well as elevated serum IL-6 ( Fig. 1C and S2D), a cytokine that feeds Ab production [37][38][39][40] .  6 is dysregulated in several common non-communicable diseases [41][42][43] and during acute 146 respiratory distress syndrome 44 , risk factors for COVID-19-associated mortality 45,46 . 147 Interestingly, we observed a lack of association between IL-6 and IgM levels, supporting that 148 levels of the cytokine and IgA mark a protracted, severe clinical course of  RBD responses were lower in non-hospitalized and hospitalized females as compared to 150 males, trending similarly for S (Fig. S3A) and in line with females developing less severe 151 disease 47 . 152 153 In our study, PCR+ individual anti-viral IgG levels were maintained two months post-disease 154 onset/positive PCR test, while IgM and IgA decreased, in agreement with their circulating t1/2 155 and viral clearance (Fig. S3B). In longitudinal patient samples (sequential serum sampling of 156 10 PCR+ individuals in the study) where we observed seroconversion, IgM, IgG and IgA peaked 157 with similar kinetics when all three isotypes developed, although IgA was not always 158 generated in Category 1 and 2 individuals ( Fig 1D). Overall, disease severity showed the most 159 consistent relationship with any measure and was the primary predictor of Ab levels ( Fig. S3C  160 and D). 161 162 We next characterized the virus neutralizing Ab response, a key parameter for understanding 163 the potential for protective humoral immune responses and the selection of plasma therapy 164 donors. Benefitting from a robust in vitro pseudotype virus neutralization assay 48 , we 165 measured serum inhibition of viral cell entry and detected neutralizing antibodies in the 166 serum of all SARS-CoV-2 PCR+ individuals (n=48), and in all except two healthy Ab-positive 167 donors screened (n=56). Neutralizing responses were not seen in samples before 168 seroconversion (Fig. 1D) or negative controls. A large range of neutralizing ID50 titers was 169 apparent, with binding and neutralizing Ab levels being highly correlated (Fig. S3D). The 170 strongest neutralizing responses were observed in samples from patients on mechanical 171 ventilation in intensive care (Category 3, g.mean ID50=5,058; 95% CI [2,422 -10,564]), in-172 keeping with their elevated Ab response ( Fig 1E). Sera from healthy blood donors and 173 pregnant women also displayed neutralizing responses, but consistent with the binding data 174 were less potent than those observed in individuals with severe disease (ID50=600; 95% CI 175 [357 -1,010] and ID50=350; 95% CI [228 -538], respectively, Fig. 1F). Across the two antigens 176 and three isotypes, anti-RBD IgG levels were most strongly correlated with neutralization. 177 178 Probabilistic seroprevalence estimates in blood donors and pregnant women 179 As Stockholm is a busy urban area and Sweden did not impose strict lockdown in response to 180 SARS-CoV-2 emergence, we sought to better understand the frequency and nature of anti-181 viral responses in healthy blood donors and pregnant women sampled throughout the first 182 outbreak (March 30 -August 23 rd 2020) ( Fig. 2A). However, critical to accurate individual 183 measures and seroprevalence estimates is the decision about whether a sample is defined as 184 . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint 6 positive or not. For example, current clinically approved tests use a ratio between a 185 "representative" positive and negative serum calibrator to determine positivity, although we 186 show here that these are highly variable. 187 188 To improve our understanding of the assay boundary, we repeatedly analyzed a large number 189 of historical (SARS-CoV-2-negative) controls (blood donors from the spring of 2019, n=595) 190 alongside test samples throughout the study. We considered the spread of negative values 191 critical, since the use of a small and unrepresentative set of controls can lead to an incorrectly 192 set threshold and errors in the seroprevalence measurement. This is illustrated by the random 193 sub-sampling of non-overlapping groups of negative controls, resulting in a 40% difference in 194 the seroprevalence estimate (Fig. S4A). Seroprevalence in the healthy cohorts according to 195 conventional 3 and 6 SD cut-offs are shown in Fig. 2C. 196 197 The fact that many healthy donor test samples had optical densities between the 3 and 6 SD 198 cut-offs for both or a single antigen ( Fig. 2B  Therefore, we generated an equal-weighted ensemble learner (ENS) from the output of LOG 212 and LDA that maximized sensitivity, specificity and consistency across different cross-213 validation strategies ( Fig. 2D and S4D). While weekly rates varied (S Table 2), the ENS learner 214 identified 13.7% seroprevalence in healthy blood donors and pregnant women at the last 215 sampling week (Supp. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 19, 2020. ;https://doi.org/10.1101https://doi.org/10. /2020  Outside of the severe disease setting, it is critical to accurately determine who and how many 250 people have seroconverted. This is complicated by low titer values, which in some cases -and 251 increasingly with time since exposure -overlap outlier values among negative control 252 samples. Test samples with true low anti-viral titers will, therefore, fall into this range of weak 253 responders as the B lymphocyte response contracts following viral clearance, highlighting the 254 need to better understand the assay boundary in multiple dimensions. As future tests begin 255 to survey individual Ab responses to a multitude of antigens in parallel, the ML approaches 256 presented here will enable the identification of disease sub-types and facilitate longitudinal 257 measures. 258 259 We applied these tools to blood donors and pregnant women, two good sentinels for 260 population health, although they are not enriched for groups with high risk for SARS-CoV-2 261 infection, such as healthcare workers and public transportation employees, where 262 seroprevalence may be higher. Blood donors are generally working age, active and mobile 263 members of society with a good understanding of health, and pregnant women in Sweden 264 will have been advised to take precautions against infectious diseases through their 265 practitioners. Interestingly, in our study, both groups showed a similar seroprevalence during 266 the time period analyzed. Tracking these cohorts over time, we modelled seroprevalence 267 changes at the population level. We found the steep climb in Ab positivity at the start of the 268 pandemic (as the virus emerged) to increase at a slower rate during subsequent weeks, 269 reaching nearly 14% by five months from the peak of spring 2020 COVID-19 deaths in the 270 country. These data indicate that serological herd immunity to the initial outbreak was not 271 . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint In addition, longitudinal samples from 10 of these patients were collected to monitor 296 seroconversion and isotype persistence. 297 298 Hospital workers at Karolinska University Hospital were invited to test for the presence of 299 SARS-CoV-2 RNA in throat swabs in April 2020 and virus-specific IgG in serum in July 2020. We 300 screened 33 PCR+ individuals to provide additional training data for ML approaches. All 301 participants provided written informed consent. The study was approved by the National 302 Ethical Review Agency of Sweden (2020-01620) and the work was performed accordingly. 303 304 Anonymized samples from blood donors (n=100/week) and pregnant women (n=100/week) 305 were randomly selected from their respective pools by the department of Clinical 306 Microbiology, Karolinska University Hospital. No metadata, such as age or sex information 307 were available for these samples in this study. Pregnant women were sampled as part of 308 routine for infectious diseases screening during the first trimester of pregnancy. Blood donors 309 (n=595) collected through the same channels a year previously were randomly selected for 310 use as negative controls. Serum samples from individuals testing PCR+ for endemic 311 coronaviruses, 229E, HKU1, NL63, OC43 (n=20, ECV+) in the prior 2-6 months, were used as 312 additional negative controls. The use of study samples was approved by the Swedish Ethical 313 Review Authority (registration no. 2020-01807). Stockholm County death and Swedish 314 mortality data was sourced from the ECDC and the Swedish Public Health Agency, 315 respectively. Study samples are defined in Table 1.  316  317 Serum sample processing 318 Blood samples were collected by the attending clinical team and serum isolated by the 319 department of Clinical Microbiology. Samples were anonymized, barcoded and stored at -320 . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 19, 2020.
11 20 o C until use. Serum samples were not heat-inactivated for ELISA protocols but were heat-321 inactivated at 56 o C for 60 min for neutralization experiments. 322 323 SARS-CoV-2 antigen generation 324 The plasmid for expression of the SARS-CoV-2 prefusion-stabilized spike ectodomain with a 325 C-terminal T4 fibritin trimerization motif was obtained from 26 . The plasmid was used to 326 transiently transfect FreeStyle 293F cells using FreeStyle MAX reagent (Thermo Fisher 327 Scientific). The ectodomain was purified from filtered supernatant on Streptactin XT resin (IBA 328 Lifesciences), followed by size-exclusion chromatography on a Superdex 200 in 5 mM Tris pH 329 8, 200 mM NaCl. 330 331 The RBD domain (RVQ -QFG) of SARS-CoV-2 was cloned upstream of a Sortase A recognition 332 site (LPETG) and a 6xHIS tag, and expressed in 293F cells as described above. RBD-HIS was 333 purified from filtered supernatant on His-Pur Ni-NTA resin (Thermo Fisher Scientific), followed 334 by size-exclusion chromatography on a Superdex 200. The nucleocapsid was purchased from 335 Sino Biological. 336 337 Anti-SARS-CoV-2 ELISA 338 96-well ELISA plates (Nunc MaxiSorp) were coated with SARS-CoV-2 S trimers, RBD or 339 nucleocapsid (100 μl of 1 ng/μl) in PBS overnight at 4 o C. Plates were washed six times with 340 PBS-Tween-20 (0.05%) and blocked using PBS-5% no-fat milk. Human serum samples were 341 thawed at room temperature, diluted (1:100 unless otherwise indicated), and incubated in 342 blocking buffer for 1h (with vortexing) before plating. Serum samples were incubated 343 overnight at 4 o C before washing, as before. Secondary HRP-conjugated anti-human 344 antibodies were diluted in blocking buffer and incubated with samples for 1 hour at room 345 temperature. Plates were washed a final time before development with TMB Stabilized 346 Chromogen (Invitrogen). The reaction was stopped using 1M sulphuric acid and optical 347 density (OD) values were measured at 450 nm using an Asys Expert 96 ELISA reader (Biochrom 348 Ltd.). Secondary antibodies (all from Southern Biotech) and dilutions used: goat anti-human 349 IgG (2014-05) at 1:10,000; goat anti-human IgM (2020-05) at 1:1000; goat anti-human IgA 350 (2050-05) at 1:6,000. All assays of the same antigen and isotype were developed for their 351 fixed time and samples were randomized and run together on the same day when comparing 352 binding between PCR+ individuals. Negative control samples were run alongside test samples 353 in all assays and raw data were log transformed for statistical analyses. 354 355 In vitro virus neutralisation assay 356 Pseudotyped viruses were generated by the co-transfection of HEK293T cells with plasmids 357 encoding the SARS-CoV-2 spike protein harboring an 18 amino acid truncation of the 358 cytoplasmic tail 26 ; a plasmid encoding firefly luciferase; a lentiviral packaging plasmid 359 (Addgene 8455) using Lipofectamine 3000 (Invitrogen). Media was changed 12-16 hours post-360 transfection and pseudotyped viruses harvested at 48-and 72-hours, filtered through a 0.45 361 . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 19, 2020. ; 13 machines is an altogether different approach. We opted for a linear kernel, once again 403 resulting in a linear boundary. SVM constructs a boundary that maximally separates the 404 classes (i.e. the margin between the closest member of any class and the boundary is as wide 405 as possible), hence points lying far away from their respective class boundaries do not play 406 an important role in shaping it. SVM thus puts more weight on points closest to the class 407 boundary, which in our case is far from being clear. Linear SVM has one tuning parameter C, 408 a cost, with larger values resulting in narrower margins. We tuned C on a vector of values 409 (0.001, 0.01, 0.5, 1, 2, 5, 10) via an internal 5-fold CV with 5 repeats (with the winning 410 parameter used for the final model for the main CV iteration). We also note that the natural 411 output of SVM are class labels rather than class probabilities, so the latter are obtained via 412 the method of Platt 54 . 413 414 We considered three strategies for cross-validation: i) random: individuals were sampled into 415 folds at random, ii) stratified: individuals were sampled into folds at random, subject to 416 ensuring the balance of cases:controls remained fixed and iii) unbalanced: individuals were 417 sampled into folds such that each fold was deliberately skewed to under or over-represent 418 cases compared to the total sample. We sought a method that worked equally well across all 419 cross-validation schemes, as the true proportion of cases in the test data is unknown and so 420 a good method should not be overly sensitive to the proportion of cases in the training data. 421 We found most methods worked well and chose to create an ensemble (ENS) method 422 combining the method with the highest sensitivity (LOG) with the highest specificity (LDA), 423 defined as an unweighted average of the probabilities generated under both. 424 425 We trained the ensemble learner on all 733 training samples and predicted the probability of 426 anti-SARS-CoV-2 antibodies in blood donors and pregnant volunteers sampled in 2020. The 427 ENS learner had average sensitivity > 99.1% and average specificity >99.8%. We inferred the 428 proportion of the sampled population with positive antibody status each week using multiple 429 imputation. We repeatedly (1,000 times) imputed antibody status for each individual 430 randomly according to the ensemble prediction, and then analyzed each of the 1,000 datasets 431 in parallel, combining inference using Rubin's rules, derived for the Wilson binomial 432 proportion confidence interval 55 . 433 434 Our Bayesian approach is explained in detail in Christian et al 49 . Briefly, we used a logistic 435 regression over anti-RBD and -S training data to model the relationship between the ELISA 436 measurements and the probability that a sample is antibody-positive. We adjusted for the 437 training data class proportions and used these adjusted probabilities to inform the 438 seroprevalence estimates for each time point. Given that the population seroprevalence 439 cannot increase dramatically from one week to the next, we constructed a prior over 440 seroprevalence trajectories using a transformed Gaussian Process, and combined this with 441 the individual class-balance adjusted infection probabilities for each donor to infer the 442 posterior distribution over seroprevalence trajectories. 443 . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 19, 2020. ; https://doi.org/10.1101/2020.07.17.20155937 doi: medRxiv preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 19, 2020. ; https://doi.org/10.1101/2020.07.17.20155937 doi: medRxiv preprint Figure 2: Probability-based seroprevalence estimates in Stockholm during the initial outbreak (A) Study sample collection intervals are shown alongside the 14-day COVID-19 death rate per million inhabitants in Sweden and relevant countries for comparison. (B) Log-transformed un-normalized OD measurements from all BD and PW in the study. Conventional 3 (dotted red line) and 6 SD (solid red line) cut-offs are shown; calculated from n=595 historical controls; 100 random negative controls (C, with 95% CI of the median) are shown for each assay. (C) The percentage anti-S and -RBD IgG positive per sampling week in BD and PW show according to 3 or 6 SD cut-offs. (D) S and RBD responses from PCR+ individuals were used to train different machine learning algorithms to assign likelihood of past infection. We created an ensemble learner (ENS) from the output of logistic regression and linear discriminant analysis, providing a highly sensitive, specific and consistent multi-dimensional solution to the problem of weak reactors, and assigning each data point a probability of being positive. Conventional 3 and 6 SD cut-offs are shown for each antigen, with probabilities assigned to selected points.

Historical blood donors n=595
Sample collection dates April-June 2019

ECV+ donors n=20
Sample collection dates July-December 2019 § Under the care of Karolinska University Hospital No additional metadata available for any samples is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 19, 2020. ;  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint    is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 19, 2020. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 19, 2020. ; https://doi.org/10.1101/2020.07.17.20155937 doi: medRxiv preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 19, 2020. ;