ABSTRACT
Many bacterial diseases are caused by organisms that ordinarily are harmless components of the human microbiome. Effective interventions against these conditions requires an understanding of the processes whereby symbiosis or commensalism breaks down. Here, we performed bacterial genome-wide association studies (GWAS) of Neisseria meningitidis, a common commensal of the human respiratory tract despite being a leading cause of meningitis and sepsis. GWAS discovered single nucleotide polymorphisms (SNPs) and other bacterial genetic variants associated with invasive meningococcal disease (IMD) versus carriage in several loci across the genome, revealing the polygenic nature of this phenotype. Of note, we detected a significant peak around fHbp, which encodes factor H binding protein (fHbp); fHbp promotes bacterial immune evasion of human complement by recruiting complement factor H (CFH) to the meningococcal surface. We confirmed the association around fHbp with IMD in a validation GWAS, and found that SNPs identified in the validation affecting the 5’ region of fHbp mRNA alter secondary RNA structures, increase fHbp expression, and enhance bacterial escape from complement-mediated killing. This finding mirrors the known link between complement deficiencies and CFH variation with human susceptibility to IMD, highlighting the central importance of human and bacterial genetic variation across the fHbp:CFH interface in IMD susceptibility, virulence, and the transition from carriage to disease.
Many members of the human microbiota are capable of causing disease in certain circumstances. Given the complexity of relationships between commensal and symbiotic bacteria and their hosts, there are many reasons why interactions become disrupted, including genetic polymorphisms and phenotypic changes in hosts and infecting microbes. These diseases present both an evolutionary puzzle, as disease is often a dead-end for transmission, and an epidemiological challenge, as the aetiological agent circulates widely undetected, striking seemingly at random. An important example of such an ‘accidental’ pathogen is Neisseria meningitidis, a common coloniser of the human oropharynx [1, 2], which occasionally causes devastating invasive meningococcal disease (IMD).
The molecular mechanisms involved in the transition of asymptomatic colonisation to IMD remain poorly defined despite extensive research. Invasive meningococci almost invariably express specific capsular polysaccharides (serogroups A, B, C, X or W) and belong to certain lineages (the ‘hypervirulent’ meningococci). On the other hand, complement deficiencies are a well-known human risk factor [3], but do not explain the overwhelming majority of cases; however, host susceptibility to IMD is linked to genetic variation in the locus encoding the negative regulator of the complement system, complement factor H (CFH) [4, 5]. N. meningitidis expresses factor H binding protein (fHbp), an important vaccine antigen, which binds CFH with high affinity, and promotes evasion of complement-mediated killing [6, 7].
We explored the relationship between meningococcal genetic variation and IMD in two genome wide association studies (GWAS) encompassing 1,556 genomes. Initially, we compared N. meningitidis isolates from an outbreak of 52 cases of IMD and 209 carriers in the Czech Republic [8, 9, 10], in which a preponderance of disease isolates belonged to the hypervirulent clonal complex ST-11 (cc; O.R. 3.4, 95% CI 1.7-7.1, Wald test p=10−3.90) [11, 12] (Figure 1A, Supplementary Figure 1). Heritability was substantial, with 36.5% (95% CI 15.9-57.0%) of the variability in case-control status attributable to bacterial genetics. GWAS identified 17 loci harbouring variants associated with carriage versus disease, after controlling for strain-to-strain differences using a linear mixed model [13]. In total, we tested for associations at 156,804 SNPs mapped to an ST-11 complex reference genome [14] and 7,806,583 31-nucleotide sequence fragments (kmers) that capture variation missed by reference-based SNP calling [15]. After Bonferroni correction for unique phylopatterns [16], we found significant associations in seven SNPs (p<10−6.20) and 465 kmers (p<10−6.79) across the genome (Figure 1B and C).
GWAS identified variation in several genes associated with known virulence factors including capsule and the meningococcal disease-associated (MDA) island (Figure 1D-F, Supplementary Text 1) [17, 18, 19, 20]. In summary, 21 kmers in the gene csb encoding the capsule polysialyltransferase were associated with elevated disease risk (p=10−8.58) and mapped to a poly-A tract in the coding region which mediates ON:OFF switching of capsule expression [21], capturing the capsule ON status (Supplementary Figure 2). Nine kmers which tagged the consensus haplotype across three SNPs upstream of ctrF (involved in capsule export), between the predicted -10 and -35 sequences of the ctrF promotor, were more frequent in carriage (P=10−6.93) and may affect transcription (Supplementary Figure 3). Within the MDA island, nine carriage-associated kmers tagged SNPs in the IgG binding domain of tspB (p=10−8.51; Supplementary Figure 4) [19, 20]. Several other loci contained significant hits (Supplementary Table 1), including a band of SNPs in perfect linkage disequilibrium in six genes (p=10−7.10; Figure 1 G-M).
Notably, our analysis identified novel signals in a region of elevated significance within the fba-fHbp operon (Figure 2A). fba encodes fructose-1, 6-bisphosphate aldolase (Fba) which functions in carbon metabolism and cell adhesion [22, 23] while fHbp encodes fHbp. The fba-fHbp signals were i) independent of the six other genome-wide significant SNPs, ii) physically localised in a single region (unlike other polymorphisms), and iii) displayed a significant decay of signal around a prominent peak, characteristic of an authentic association (Figure 1D-M, Supplementary Figure 5). The significant IMD-associated SNP (P=10−6.51) occurred at high frequency in the sample (58.7% invasive cases, 17.2% carriers), and explained 10% of sample heritability. Therefore, while several signals were detected across the genome, the association at fba-fHbp was of particular interest.
The significant SNP in the fba-fHbp locus occurred at nucleotide 900 of fba (fbaS900, P=10−6.51), near the 3’ end of fba, as did two kmers spanning this SNP commencing at nucleotides 898 and 899 of fba (fbaK898 and fbaK899; p=10−7.16). There was a genome-wide significant enrichment in the rest of the fba-fHbp operon as a whole (adjusted harmonic mean p=10−1.72). The two kmers spanned protein-coding nucleotides 898-929, tagging the fbaS900 SNP and two others: fbaS912 (p=10−2.25) and fbaS913 (p=10−2.25) (Figure 2A, Supplementary Figure 6). The peak signal of association therefore coincided with fbaS900, which was in tight linkage disequilibrium with a neighbouring SNP fbaS897 (P=10−5.77, r2 = 0.98). fbaS900 and fbaS897 both cause synonymous substitutions, located 323 and 326 bp upstream of the fHbp start codon, respectively.
A replication study was undertaken to test the association of fbaS900 with IMD using genomes of an extended set of 1,295 clonal complex ST-41/44 meningococci, comprising 1046 IMD and 249 carriage isolates (available at https://PubMLST.org/neisseria [24]) (Supplementary Figure 7). Clonal complex ST-41/44 strains are the leading cause of IMD world-wide [25], and are polymorphic at fbaS900. Analysing a single clonal complex mitigates confounding due to heterogeneous sampling across diverse lineages [26, 27]. After Bonferroni correction for two candidate kmers (p<10−1.60), the IMD-associated signal from kmers fbaK898 and fbaK899 was replicated in the ST-41/44 complex isolates (p=10−2.37), with the direction of the effect replicated (β=0.16). Moreover, the general enrichment in significance in fba-fHbp was replicated (adjusted harmonic mean p=10−1.96).
We explored possible effects of the synonymous SNPs in fba on the expression of fHbp, which can be translated from a bicistronic fba-fHbp mRNA or from a fHbp-specific promoter [28]. Expression of fHbp can be regulated by FNR binding to sequences 80 bp upstream of the start codon [28]. We noticed that the synonymous IMD-associated substitutions at fbaS897 and fbaS900 form a motif resembling an FNR box 314 bp upstream of fHbp (Figure 3A, Supplementary Figure 6 and Text 2). Electrophoresis mobility shift assays (EMSA) demonstrated binding of a constitutively active version of FNR to the known FNR site but not to sequences within fba, irrespective of the SNP sequence (Figure 3B, Supplementary Figure 8). Furthermore, there was no detectable difference in fHbp expression by four isogenic strains covering all combinations of the SNPs fbaS897C/T and fbaS900T/C in an ST-41/44 complex strain (Figure 3C-D, Supplementary Text 2).
Therefore, we considered whether other variants in the fba-fHbp region could be driving the signals of association as a total of 1,346 kmers in fba-fHbp were more significant in the replication study than the candidate kmers, fbaK898 and fbaK899 (Figure 2B-D). The most significant kmers above 1% minor allele frequency in the replication study were those starting at fHbp nt -20 and -14 (henceforth fHbpK-20 and fHbpK-14, respectively p<10−5.93), at nt 686 (fHbpK686, p<10−6.04), and nt 752 (fHbpK752, p<10−7.64) relative to the first base of the start codon. The SNPs tagged by these kmers were: i) fHbpS-7C/T in the 5’-untranslated region (5’-UTR) of fHbp adjacent to the ribosome binding site (RBS, p=10−6.13); ii) fHbpS13G/A encoding an Ala5Thr substitution in fHbp near a previously identified anti-RBS (α-RBS) site (p=10−6.03) [29]; iii) fHbpS781A/G which leads to an Arg261Gly substitution in fHbp adjacent to the CFH binding site (p=10−7.64); and iv) fHbpS700G/A causing a Gly234Ser substitution distant from the site of CFH (p=10−6.32) (Figure 3E, Supplementary Figures 9-11).
The IMD-associated SNP fHbpS781G removes a charged side chain (on Arg261) which could affect interactions with CFH (Figure 3E). We generated proteins with Arg261 or Gly261 in v2.24 fHbp, the allele most significantly associated with IMD (p=10−3.23, Supplementary Figure 12), and assessed fHbp:CFH binding by ELISA. However, we found no evidence that fHbpS781 altered binding to CFH (Figure 3F).
To explore an alternative mechanism, we examined the effect of the SNPs around the RBS and α-RBS sequence using SHAPE chemistry to probe the RNA secondary structure using a 183 nt RNA encompassing the RBS at position -8 to -12 (relative to the translation start site), and fHbpS-7T/C and fHbpS13A/G. The secondary structure model based on SHAPE reactivity data of fHbpS-7C/S13G at 37°C (Figure 4A) is consistent with the RBS being base-paired and masked through the formation of a relatively long imperfect helix of 11 base pairs that includes both anti-RBS sequences 1 (αRBS-1) and 2 (αRBS-1) [29]; the polymorphic sites in the carriage associated fHbpS-7C/S13G structure form a G:C base pair at the top of the helix. However, the local RNA structure of the IMD-associated fHbpS-7T/S13A shows significant differences (Figure 4B-C), with the 6 bp structure around the RBS being much more open and accessible (Supplementary Figures 13-14 for SHAPE analysis and predicted RNA structures at 30°C and 42°C). These data demonstrate that the RNA structure around the RBS is modulated by sequence variation, suggesting that the polymorphisms modulate initiation of protein synthesis.
In order to examine the impact of the polymorphisms on fHbp expression, we generated a series of isogenic ST-41/44 complex strains with combinations of the SNPs, fHbpS-7T/C and fHbpS13A/G, and examined surface expression of fHbp. Notably, the fHbpS-7T IMD risk allele conferred significantly higher fHbp expression, measured by flow cytometry, than fHbpS-7C, irrespective of fHbpS13A/G (p<0.05, Figure 4D-E). Further, when bacteria were incubated in normal human sera (NHS), strains with fHbpS-7T displayed increased survival compared with fHbpS-7C, but not in heat-inactivated human serum lacking functional complement, irrespective of the fHbpS13A/G allele (p<0.05, One way Anova, Figure 4F, Supplementary Figure 15). Taken together, these results are consistent with the IMD-associated alleles at the 5’ end of fHbp around the conferring enhanced resistance of bacteria against complement-mediated killing, a major component of immunity against N. meningitidis.
Our study exclusively used publicly available genome sequences and metadata stored in the pubMLST Neisseria database (https://pubmlst.org/neisseria/), using well-described datasets from the Czech Republic and of clonal complex ST-41/44 isolates for replication. GWAS studies of virulence are particularly suitable in recombinogenic organisms such as N. meningitidis, as recombination assists fine-mapping by breaking down clonal background [30, 31, 8, 10, 32]. Previous GWAS have not identified variants influencing IMD severity, including in fHbp, and adaptation has also not between identified between paired isolates sampled from blood and cerebrospinal fluid [33, 34]. We found that the genetic architecture of meningococcal virulence is polygenic, adding to the growing understanding on virulence factors influencing the risk of IMD [17, 18, 35, 36]. Although several loci across the genome were identified, the major signals associated with IMD versus carriage emanated from the fba-fHbp region. This is particularly significant as GWAS of human genetic susceptibility identified SNPs in CFH associated with IMD, although the mechanism remains unknown [4, 37]. Our results therefore highlight that the interaction of bacterial and human gene pools across a single molecular interface, involving fHbp and CFH, dictates host susceptibility and the propensity of strains to cause invasive disease.
MATERIALS AND METHODS
Sampling frames
The discovery sample collection comprised 261 Neisseria meningitidis isolates from the Czech Republic [8, 9, 10]. Carriage samples were collected from young adults with no association with patients with IMD over four months, while disease isolates were from cases of IMD submitted to the Czech National Reference Laboratory for Meningococcal Infections in 1993 [8, 10]. 252 isolates from this collection were described previously to identify the MDA island by PCR, but not by WGS. Illumina sequencing reads were downloaded from the European Nucleotide Archive (ENA, http://www.ebi.ac.uk/ena), and Velvet de novo assemblies from PubMLST (https://pubmlst.org/neisseria/) [24]. PubMLST IDs, ENA accession numbers and phenotypes can be found in Supplementary Table 6.
The replication sample collection comprised 1,295 ST-41/44 complex genomes downloaded from pubMLST. We downloaded all genomes with a non-empty disease or carriage status, with de novo assemblies ≥2Mb in length, excluding the 261 genomes from the discovery sample collection and excluding genomes that met any of the following criteria: i) annotated as non-Neisseria meningitidis, ii) annotated with the disease phenotype “other”, iii) non-ST-41/44 complex assignment (described below), iv) genomes with more than 700 contigs, v) genomes with only one or two contigs, and vi) genomes with a total assembly length greater than 2.5Mb. PubMLST IDs and phenotypes can be found in Supplementary Table 7.
SNP calling and kmer counting
For the discovery sample collection, sequence reads were mapped against the reference ST-11 complex, FAM18 genome (GenBank number: NC_008767.1) using Stampy [38]. Bases were called using previously described quality filters [39, 40, 41]. We identified 150,502 biallelic SNPs, 6,063 tri-allelic SNPs, and 239 tetra-allelic SNPs.
To capture non-SNP-based variation, and SNPs not in the reference genome, we pursued a kmer-based approach where all unique 31 bp haplotypes were counted from Velvet assemblies using dsk [42] in both sample collections. For both sample collections, a unique set of variably present kmers across each data set was created, with the presence or absence of each unique kmer determined per genome. Algorithms, coded in C++, can be downloaded from https://github.com/jessiewu/bacterialGWAS/blob/master/kmerGWAS/gwas_kmer_pattern [16]. We identified 7,806,583 variably present kmers in the discovery sample collection and 11,114,868 in the replication study collection.
Phylogenetic inference
A maximum likelihood phylogeny was estimated for the discovery study collection for visualisation of phylogenetic relationships in the sampling frame and for SNP imputation purposes using RaxML [43] with a general time reversible (GTR) model and no rate heterogeneity, using alignments from the mapped data based on biallelic sites, with non-biallelic sites being set to the reference allele.
Non-ST-41/44 complex assignment in the replication study collection was determined using the kmer counts. A UPGMA tree was built using a kmer presence/absence distance matrix and all descendants of the most recent common ancestor of the genomes annotated in pubMLST as ST-41/44 complex were kept in order to identify unlabelled members of the complex.
SNP imputation
For the SNP discovery analysis, missing sites due to sequencing ambiguity or strict SNP-calling thresholds were imputed using ancestral state reconstruction [44] implemented in ClonalFrameML [45]. This approach was previously shown to achieve high accuracy [16].
Correcting for multiple testing
Multiple testing was accounted for by applying Bonferroni correction to control the strong-sense family-wise error rate (ssFWER) [46]. The unit of correction for all studies of individual loci in the discovery GWAS was taken to be the number of unique “phylopatterns” i.e. the number of unique partitions of individuals according to allele membership. The locus effect of an individual variant was considered to be significant if its P value was smaller than α/np, where we took α = 0.05 to be the genome-wide false positive rate (i.e. family-wide error rate, FWER) and np to be the number of unique phylopatterns. In the discovery SNP-based analysis, np was taken to be the number of unique SNP phylopatterns (80,099) and in the kmer-based analyses, np was taken to be the number of unique kmer presence/absence phylopatterns (307,830).
In the replication GWAS, since we tested whether the two genome-wide significant, disease- associated kmers in the fba-fHbp operon replicated, we applied a Bonferroni correction to obtain a significance threshold of 0.05/2=10−1.60.
When testing the discovery GWAS for lineage effects by the Wald test for principal component-phenotype associations, a Bonferroni correction was applied for the number of non-redundant principal components, which equalled the sample size (261) since no two genomes were identical.
Testing for locus effects
We performed association testing using the R package bugwas (https://github.com/sgearle/bugwas) [16] which uses linear mixed model (LMM) analyses implemented in the software GEMMA [13] to control for population structure. We modified GEMMA version 0.93 to enable significance testing of non-biallelic variants [16]. GEMMA was run using no minor allele frequency cut-off to include all variants.
For the discovery GWAS, we computed the relatedness matrix from biallelic SNPs only using the option “-gk 1” in GEMMA to calculate the centred relatedness matrix. For the replication study, we computed the relatedness matrix from the kmer presence/absence matrix using Java code which also calculates the centred relatedness matrix.
Identifying lineage effects
We tested for associations between bacterial lineages and the phenotype in the discovery sample collection using the R package bugwas (https://github.com/sgearle/bugwas) [16]. Principal components were computed based on biallelic SNPs and interpreted in terms of bacterial lineages. To test the null hypothesis of no background effect of each principal component we employed a Wald test, where we compared the test statistic against a χ2 distribution with one degree of freedom to obtain a P value.
Estimating sample heritability
Heritability of the sample, the proportion of the phenotypic variation that can be explained by the bacterial genotype, was estimated using the LMM null model in GEMMA from post-imputation biallelic SNPs [13]. Estimating heritability in case control studies is dependent on the prevalence of cases and the sampling scheme [47]. The proportion of sample heritability explained by the kmers fbaK898 and fbaK899 in the discovery set was estimated by including the phylopattern of the two kmers as a covariate in the LMM null model in GEMMA, and calculating the difference in heritability compared to including no covariates.
Testing for independent SNP associations
To determine whether pairs of significant SNP associations in the discovery sample collection represented independent signals, the two unique significant SNP patterns were tested using LMM including both SNPs as fixed effects, thereby assuming additivity between the two loci.
Variant annotation
SNPs were annotated in R using scripts at http://github.com/jessiewu/bacterialGWAS. The reference FASTA and GenBank files were used in order to determine SNP type (synonymous, non-synonymous, nonsense, read-through and intergenic), codon, codon position, reference and non-reference amino acids, gene name and gene product, on the assumption of a single change to the reference sequence.
To annotate kmer sequences, we mapped kmers to the reference FAM18 genome using Bowtie2 [48] and the options “-r -D 24 -R 3 -N 0 -L 18 -i S,1,0.30” to identify a single best mapping position for each kmer. For kmers which did not map to the reference genome, BLAST [49] was used to identify the kmer position within FAM18. BLAST results of any sequence length were taken, and the number of mismatches along the whole length of the kmer was recalculated assuming the whole kmer aligned. Kmers with five or fewer mismatches to the reference were shown as aligned to the reference, all other kmers were shown as unaligned to the reference.
To understand the variation captured by the significant kmers in the gene csb, BLAST [49] was used to extract all copies of the MC58 (Genbank accession number NC_003112.2) allele of csb, the allele that the significant csb kmers mapped to.
As the reference FAM18 genome contains multiple copies of the gene tspB, to understand the variation captured by the significant kmers in tspB, BLAST [49] was used to identify all kmer alignments with just the FAM18 tspB gene NMC_RS00140.
Software
Software applied within these analyses can be found at http://github.com/jessiewu/bacterialGWAS and http://github.com/sgearle/bugwas.
Strain construction
The primers and strains used to test the effects of SNPs are listed in Supplementary Tables 2-4. The fbaS897/S900 SNPs were constructed by inserting a Kanamycin resistance cassette downstream of fHbp. First, the upstream fragment (starting 843 bp upstream of the fHbp start codon including the C terminus of fba, terminating 12 bp downstream of the fHbp stop codon) and downstream fragment (751 bp downstream of the fHbp stop codon) were amplified with primers ERS001/ERS004 and ERS007/ERS008 respectively from 0011/93 N. meningitidis gDNA. The kanamycin resistance cassette was amplified from pGEMTEasy-Kan using ERS005/ERS006 and the three fragments were cloned into pUC19 using NEB Builder HiFi DNA assembly kit (New England Biolabs). A second set of overlap primers were used to introduce SNPs into a second upstream fragment using primer combinations: ERS001/ERS002 and ERS003/ERS004, ERS001/ERS009 and ERS010/ERS004, and ERS001/ERS011 and ERS012/ERS004. The constructs were purified and transformed into 0011/93 N. meningitidis. For each strain, three independent single colonies were pooled and gDNA from the pooled stocks was checked by PCR and sequencing.
The fHbpS-7/S13 SNPs were constructed by inserting an erythromycin resistance cassette downstream of fHbp. First, a fragment corresponding to 496 bp upstream of the fHbp start codon and the fHbp ORF, and a fragment corresponding to 707 bp downstream of the fHbp stop codon) were amplified with primers ML428/ML429 and ML434/ML433 respectively from 0011/93 N. meningitidis gDNA. The erythromycin resistance cassette was amplified from pNMC2 [50] using ML430/ML435 and the three fragments were cloned into pUC19 using NEB Builder HiFi DNA assembly kit (New England Biolabs). The resulting vector was used as a template to generate fHbp with different SNPs by site directed mutagenesis using primer combinations: ML436/405 and ML437/406, ML438/ML405 and ML439/ML406, and ML440/ML405 and ML441/406. The constructs were purified and used to transform 0011/93 N. meningitidis. For each strain, three independent single transformants were pooled and gDNA from the pooled stocks was checked by PCR and sequencing.
Generation of plasmids and protein purification
V2.24 fHbp was amplified from N. meningitidis OX99.32412 and SNPs introduced by PCR, then ligated into pET21b using Quick-Stick Ligase (Bioline). Versions of fHbp were ligated into pET28a-His-MBP-TEV (in frame with sequence encoding a histidine tag and the Escherichia coli maltose-binding protein (MBP) with a C-terminal TEV cleavage site) linearised with XhoI, and constructs confirmed by sequencing.
v2.24 fHbps were expressed in E. coli B834 during growth at 22°C for 24 hrs with 1 mM IPTG (final concentration). Bacteria were harvested and resuspended in Buffer A (50 mM Na-phosphate pH 8.0, 300 mM NaCl, 30 mM imidazole) and the fHbp purified by Nickel affinity chromatography (Chelating Sepharose Fast Flow; GE Healthcare). Columns were washed with Buffer A, then with 80:20 Buffer A:Buffer B (50 mM Na-phosphate pH 8.0, 300 mM NaCl, 300 mM Imidazole), and proteins eluted in 40:60 Buffer A:Buffer B. Proteins were dialysed overnight at 4°C into PBS, 1mM DTT pH 8.0 with TEV protease prior to Nickel affinity chromatography to remove the HIS-GST-TEV. fHbp was eluted from Sepharose columns with Buffer B after washing with buffer C (50 mM Na-phosphate pH 6.0, 500 mM NaCl, 30 mM Imidazole), and dialysed overnight at 4°C into Tris pH 8.0. fHbp v1.1I311A expression and purification was performed as described previously [51].
Electrophoresis mobility shift assays
Gel retardation assays were carried out as previously using purified FNRD154A, which forms functional FNR dimers under aerobic conditions [52]. Sequences upstream of fHbp were amplified with primers ERS012/013, and the full length (420 bp) or HaeIII-digested (294 and 126 bp) fragments end-labelled with [γ-32P]-ATP with T4 polynucleotide kinase (New England BioLabs). Approximately 0.5 ng of each labelled fragment was incubated with varying amounts of FNRD154A in 10 mM potassium phosphate (pH 7.5), 100 mM potassium glutamate, 1 mM EDTA, 50 μM DTT, 5% glycerol and herring sperm DNA (25 μg ml−1). After incubation at 37°C for 20 min, samples were separated on 6% polyacrylamide gels containing 2% glycerol. Gels were analysed using a Bio-Rad Molecular Imager FX and Quantity One software (Bio-Rad).
CFH binding ELISA
To investigate CFH binding by ELISA, 96-well plates (F96 MaxiSorp; Nunc) were coated with recombinant fHbp (5 μg/well) overnight at 4°C prior to blocking with 3% bovine serum albumin (BSA) in PBS with 0.05% Tween 20 at 37°C. Plates were incubated with dilutions of CFH (Sigma). Binding was detected with anti-CFH mAb (OX24) and HRP-conjugated goat anti-mouse polyclonal antibody (Dako), and visualized with 3,3’,5,5’-tetramethylbenzidine (TMB) substrate reagent (Roche) and 2 N sulphuric acid stop solution (Roche) according to the manufacturer’s instructions, and the A450 measured (SpectraMax M5; Molecular Devices).
Serum assays
Pooled normal human serum (NHS) were used in serum assays, and heat inactivated (NHS-HI) by heating at 56°C for 30 min. N. meningitidis was grown overnight on Brain Heart Infusion (BHI) agar, and then 104 CFU were incubated in dilutions of NHS or NHS-HI for 30 min at 37°C in the presence of CO2. Bacterial survival was determined by plating onto BHI agar in triplicate. Percent survival was calculated by comparing bacterial recovery in serum with recovery from samples containing no serum. Significance was analysed by two-way ANOVA (GraphPad Prism v.8.0).
Flow cytometry
N. meningitidis was grown on BHI agar at 32°C or 37°C prior to fixation for two hours in 3% paraformaldehyde. Surface localisation of fHbp was detected using anti-fHbp pAbs and goat anti-mouse IgG-Alexa Fluor 647 conjugate (Molecular Probes, LifeTechnologies). Samples were run on a FACSCalibur (BD Biosciences), and at least 104 events recorded before results were analysed by calculating the geometric mean fluorescence intensity in FlowJo vX software (Tree Star).
SHAPE RNA secondary structure analysis
SHAPE experiments were performed using RNA transcribed in vitro from cDNA sequence [53]. The DNA templates contained a double-stranded T7 RNA polymerase promoter sequence (TTCTAATACGACTCACTATA) followed by the sequence of interest (Supplementary Table 5). RNA purification was done with an RNA clean kit (Zymo research); RNA concentrations were measured on a Nanodrop 100 spectrophotometer. RNA chemical modification was performed in volumes of 30µl with 1.5pmol of RNA within Folding buffer (50 mM HEPES pH 8.0, 16.5 mM MgCl2). RNA samples were pre-heated at 65°C for 3 mins and immediately incubated at 30°C, 37°C or 42°C water baths for 30 mins. The modification reagent N-methylisatoic anhydride (NMIA) was added at increasing concentrations between 0 and 13mM, with DMSO (no NMIA) as control. Modification reactions were incubated for another 45mins before ethanol precipitation [54, 55]. Reverse transcription was performed using Super Script III reverse transcriptase (Invitrogen). 32P-labeled reverse transcription primers (GV1-3) are listed in Supplementary Table 4. Electrophoresis on 8% (vol/vol) polyacrylamide gels was then performed to separate fragments. Band-intensities were quantified using SAFA, version 1.1 Semi-Automated Footprinting Analysis [56].
All structure calculations were performed using RNAstructure software [57]. ΔG°SHAPE free energy change values were added to the thermodynamic free energy parameters for each nucleotide [58, 59]. Pseudoknot-energy parameters were used in calculation of ΔG°(SHAPE), according to the equation, ΔG°SHAPE(i)= m ln[SHAPE reactivity(i)+ 1]+ b. In this analysis, parameters were optimized at m=0.3 and b=-1.2 kcal/mol for fHbps7T/s13A; m=0.4 and b=-2.0 kcal/mol for fHbps-7C/s13G; nucleotides with normalized SHAPE reactivities 0–0.40, 0.40– 0.85, and >0.85 correspond to low, medium, and highly reactive positions, respectively [58, 59]. Secondary structures were rendered using VARNA [60]. [1]
Harmonic mean P value
The harmonic mean p-value (HMP) method performs a combined test of the null hypothesis that no p-value is significant [61]. The HMP method controls for the ssFWER like the Bonferroni correction. We applied the HMP procedure to the fba-fHbp region in the discovery and replication studies, including all unique kmer phylopatterns that mapped to either of the two genes plus their upstream intergenic regions. We calculated the asymptotically exact p-values using the p.hmp function from the R package ‘harmonicmeanp’, giving equal weight to all kmer phylopatterns, and the total number of tests performed genome-wide was set to be the number of kmer phylopatterns (np) in order to control the genome-wide ssFWER despite analysing just the fba-fHbp region. We adjusted the p-value by dividing it by the sum of the weights of the kmer phylopatterns included in the fba-fHbp region so that it could be directly compared to alpha, the intended ssFWER, which we set to be 0.05.
Data Availability
Illumina sequencing reads were downloaded from the European Nucleotide Archive (ENA, http://www.ebi.ac.uk/ena), and Velvet de novo assemblies and phenotypes from PubMLST (https://pubmlst.org/neisseria/). PubMLST IDs, ENA accession numbers and phenotypes can be found in Supplementary Tables 6 and 7.
Data Availability
All data, code, and materials used in the analyses can be accessed at:https://github.com/SiyuChenOxf/COVID19SeroModel/tree/master
ACKNOWLEDGEMENTS
Work in CMT’s laboratory is funded by a Wellcome Trust Senior Investigator award. D.J.W. is supported by a Sir Henry Dale Fellowship (Grant 101237/Z/13/B) and a Big Data Institute Robertson Fellowship. Computation using the Oxford Biomedical Research Computing (BMRC) facility is supported by Health Data Research UK and the NIHR Oxford Biomedical Research Centre. Financial support was provided by the Wellcome Trust Core Award Grant Number 203141/Z/16/Z. Work at the University of Washington was supported by NIH NIGMS 1R35 GM 126942.
REFERENCES
- [1].↵
- [2].↵
- [3].↵
- [4].↵
- [5].↵
- [6].↵
- [7].↵
- [8].↵
- [9].↵
- [10].↵
- [11].↵
- [12].↵
- [13].↵
- [14].↵
- [15].↵
- [16].↵
- [17].↵
- [18].↵
- [19].↵
- [20].↵
- [21].↵
- [22].↵
- [23].↵
- [24].↵
- [25].↵
- [26].↵
- [27].↵
- [28].↵
- [29].↵
- [30].
- [31].
- [32].
- [33].↵
- [34].↵
- [35].↵
- [36].↵
- [37].↵
- [38].↵
- [39].↵
- [40].↵
- [41].↵
- [42].↵
- [43].↵
- [44].↵
- [45].↵
- [46].↵
- [47].↵
- [48].↵
- [49].↵
- [50].↵
- [51].↵
- [52].↵
- [53].↵
- [54].↵
- [55].↵
- [56].↵
- [57].↵
- [58].↵
- [59].↵
- [60].↵
- [61].↵