Summary
Background Pneumococcal Conjugate Vaccine (PCV) which targets up to 13 serotypes of Streptococcus pneumoniae is very effective at reducing disease in young children; however, rapid increases in replacement with non-PCV serotypes remains a concern. Serotype 24F is one of the major invasive serotypes that mediates serotype replacement in France and multiple other countries. We aimed to identify the major pneumococcal lineage that has driven the increase of serotype 24F in France, and provide context for the findings by investigating the global diversity of serotype 24F pneumococci and characterise the driver lineage from a global perspective and elucidate its spatiotemporal transmission in France and across the world.
Methods We whole-genome sequenced a collection of 419 serotype 24F S. pneumoniae from asymptomatic carriers and invasive disease cases among individuals <18 years old in France between 2003 and 2018. Genomes were clustered into Global Pneumococcal Sequence Clusters (GPSCs) and clonal complexes (CCs) so as to identify the lineages that drove the increase in serotype 24F in France. For each serotype 24F lineage, we evaluated the invasive disease potential and propensity to cause meningitis by comparing the proportion of invasive disease cases with that of carriers. To provide a global context of serotype 24F and the driver lineage, we extracted relevant genomes and metadata from the Global Pneumococcal Sequencing (GPS) project database (n=25,590) and additionally sequenced a collection of 91 pneumococcal isolates belonging to the lineage that were responsible for the serotype 24F increase in Spain during the PCV introduction for comparison. Phylogenetic, evolutionary, and spatiotemporal analysis were conducted to understand the mechanism underlying the global spread of serotype 24F, evolutionary history and long-range transmissions of the driver lineage.
Findings A multidrug-resistant pneumococcal lineage GPSC10 (CC230) drove the serotype 24F increase in both carriage and invasive disease in France after PCV13 introduction. When compared with other serotype 24F lineages, it exhibited a 1.4-fold higher invasive disease potential and 1.6-fold higher propensity to cause meningitis. Globally, serotype 24F was widespread, largely due to clonal dissemination of GPSC10, GPSC16 (CC66) and GPSC206 (CC7701) rather than recent capsular switching. Among these lineages, only GPSC10 was multidrug-resistant. It expressed 17 serotypes, with only 6 included in PCV13 and none of the expected PCVs covered all serotypes expressed by this lineage. Global phylogeny of GPSC10 showed that all serotype 24F isolates except for one were clustered together, regardless of its country of origin. Long-range transmissions of GPSC10-24F from Europe to Israel, Morocco and India were detected. Spatiotemporal analysis revealed that it took ∼5 years for GPSC10- 24F to spread across French provinces. In Spain, we detected that the serotype 24F driver lineage GPSC10 underwent a rapid change of serotype composition from serotype 19A to 24F during the introduction of PCV13 (targets 19A but not 24F), indicating that pre-existence of serotype variants enabled GPSC10 to survive and expand under vaccine-selective pressure.
Interpretation Our work further shows the utility of bacterial genome sequencing to better understand the pneumococcal lineages behind the serotype changes and reveals that GPSC10 alone is a challenge for serotype-based vaccine strategy. More systematic investigation to identify lineages like GPSC10 will better inform and improve next-generation preventive strategies against pneumococcal diseases.
Funding Bill & Melinda Gates Foundation, Wellcome Sanger Institute, and the US Centers for Disease Control.
Introduction
Pneumococcal conjugate vaccines (PCV), which targets up to 13 capsule serotypes of Streptococcus pneumoniae that account for most of the diseases in infants, have been very effective at reducing disease worldwide.1 However, increases in replacement with non-PCV serotypes remain a concern,2–6 such as serotype 19A after PCV7 in the USA.7 In France, a sharp increase in pneumococcal meningitis cases occurred five years after the roll out of PCV13, mainly driven by a non-PCV13 serotype 24F.8, 9 This serotype also mediated serotype replacement in multiple other countries such as Argentina10, Canada,11 Denmark,12 Germany,13 Israel,14 Italy,15 Japan,16 Lebanon,17 Norway,18 Spain,19, 20 and UK21, and was reported to be the predominant serotype causing invasive pneumococcal disease (IPD) in Portugal22 after PCV13 introduction.
Compared with most of the non-PCV13 serotypes, the serotype 24F capsule has a high invasive disease potential23 and propensity to cause meningitis.24 In France, the fatality rate for meningitis due to serotype 24F pneumococci was 13%, similar to that (11%) caused by pneumococci expressing other serotypes.8 In some countries, the serotype 24F increase was concomitant with increasing prevalence of penicillin resistance in IPD overall8, 15 and IPDs due to non-vaccine serotypes.10 Despite serotype 24F being an important emerging serotype with high invasiveness and potential for antimicrobial resistance, it is not included, to our knowledge, in any expected future PCV formulations (PCV15, 20, 24).
By delineating pneumococcal lineages into Global Pneumococcal Sequence Clusters (GPSCs, aka lineages) using variations across the entire genome25 and/or clonal complexes (CCs) defined by nucleotide sequences of seven housekeeping genes,26 we observed that pneumococcal lineages driving the increase in serotype 24F varied between countries. For example, the increase was mainly driven by GPSC10 (CC230) in Argentina,10 Lebanon17 and Spain27, by GPSC6 (CC156) in Denmark12, by GPSC106 (CC2572) in Japan.16 Here we applied whole genome sequencing to investigate the pneumococcal lineages driving the increase in serotype 24F after PCV13 introduction in France. Merging with >25,000 pneumococcal genomes from 56 countries in the Global Pneumococcal Sequencing (GPS) project database, we provide context for the findings from France by investigating the global diversity of serotype 24F pneumococci. We characterise the driver lineage from a global perspective and elucidate its spatiotemporal transmission in France and across the world.
Methods
Study design
A representative set of French serotype 24F pneumococcal isolates collected through a nationwide hospital-based active surveillance for IPD8, 9 and carriage survey28 across the country were whole-genome sequenced. The collection included isolates from invasive disease cases (n=190) and asymptomatic colonisation (n=229) among individuals <18 years old between 2003 and 2018 (Figure S1). The study period spanned across different phases of PCV introduction: from PCV7 use in target groups of children (e.g. children in a daycare) to generalised use of PCV13 for all children and the subsequent increase of 24F in 2015. To identify the pneumococcal lineage(s) driving the increase in 24F, we grouped isolates into GPSCs and CCs, and evaluated each lineage’s invasive disease potential and propensity to cause meningitis by calculating odds ratio by reference to carriage.29
We then contextualised the serotype 24F and the driver lineage (GPSC10) from a global perspective by including additional genomes from the GPS project database (n=25,590 from 55 countries, last accessed on 2nd October 2021). We additionally sequenced a collection of 91 pneumococcal isolates that were responsible for the serotype 24F increase in Spain for comparison. These isolates were collected from children aged <5 years old in the Catalan support laboratory for non-mandatory molecular surveillance of IPD located at Hospital Sant Joan de Déu, Barcelona between 2009 and 2018.27 To understand the genetic diversity, phylogenetic analysis was carried out on 642 serotype 24F isolates from 29 countries across six continents (Africa, Asia, Australia, Europe, North America, and Latin America). An international collection of 888 GPSC10 isolates from 33 countries, regardless of serotype, were included for phylogenetic, evolutionary, and spatiotemporal analysis.
Sequencing and genomic characterisation
The pneumococcal isolates in this study were whole-genome sequenced at the Wellcome Sanger Institute (Hinxton, UK) on an Illumina HiSeq sequencer. The sequence reads were subjected to quality control based on the criteria as previously described25. We characterised each genome by assigning GPSC using PopPUNK25, 30, sequence type (ST) by MLSTcheck31 and then grouped STs into CC using Eburst32, predicted serotypes using SeroBA33, and resistance profile of 17 antibiotics, including penicillin, chloramphenicol, erythromycin, cotrimoxazole and tetracycline, using a pipeline developed by the Streptococcus Laboratory at the Centers for Disease Control and Prevention Atlanta, USA.33, 34 Multidrug resistance (MDR) was defined as an isolate resistant to ≥3 antibiotic classes. In our previous large-scale analysis, we showed a high concordance between GPSC and CC, therefore sequence type (ST) identified in previous studies were used to infer GPSC in this study. All sequencing reads were deposited in European Nucleotide Archive (ENA) and the accession number is enclosed with the metadata and in silico output in the supplementary file 1. Additional details on DNA extraction, genome quality control criteria, de novo assembly, annotation and in silico serotyping within serogroup 24 were described in supplementary file 2.
Phylogenetic analysis
We performed phylogenetic analysis on all serotype 24F isolates by constructing a maximum likelihood tree using FastTree version 2.1.10 with the general time reversible substitution model.34 Phylogenies were built based on single nucleotide polymorphisms (SNPs) extracted from individual alignment generated by mapping reads to a reference genome of S. pneumoniae ATCC 700669 (NCBI accession number FM211187) using SMALT version 0.7.4, with default settings35 and to a reference sequence of serotype 24F capsular encoding region (i.e. cps, NCBI accession number CR931688) using Burrows Wheeler Aligner (BWA) version 0.7.17-r118836. The phylogenetic trees were then overlaid with epidemiological data and in silico output as described above and visualised in Microreact37 at https://microreact.org/project/global_24F and https://microreact.org/project/global_24F_cps, respectively.
Evolutionary and spatiotemporal analysis
Sequence reads of GPSC10 (CC230) isolates were mapped to the GPSC10 reference genome Denmark14-32 (ENA accession number ERS1706837) using BWA version 0.7.17- r1188.36 Recombination was detected and removed using GUBBINS version 2.4.138. A recombination-free phylogeny was produced with RAxML version 8.2.839 and then visualised in Microreact37, together with metadata (https://microreact.org/project/global_GPSC10). Bayesian phylogenetic analysis on a subset of serotype 24F isolates within GPSC10 was conducted to generate a time-resolved phylogeny and provide estimates of the median effective population size over time with a 95% highest posterior density to detect any exponential increase in population using BEAST Bayesian skyline model version 2.6.340. The time-resolved phylogeny can be visualised at https://microreact.org/project/GPSC10-24F.
To calculate time taken for GPSC10-24F to spread across France, we inferred the evolutionary time between all pairs of genomes from the time-resolved phylogeny. We then utilized an risk ratio framework to calculate the odds that a pair of genomes from within the same province (0-50km) compared with pairs isolated from locations between 250-350km apart (the mean distance between different French provinces is 330km) had a specified time- to-most-recent-common-ancestor (tMRCA) across rolling 2 year divergence time windows from 2 to 20 years.41 We restricted the analysis to pairs isolated within the same year to mitigate variable geographic sampling across years. A risk ratio close to one indicated an equal chance that a pair of isolates diverged from the same tMRCA within and between provinces. To determine uncertainty, we ran this with 100 bootstrapped iterations sampling with replacement. We determined the significance of the relationship between the tMRCA and risk ratio using a generalized linear mixed model. The spatiotemporal analysis was run using R version 3.6.0.
Statistical analysis
To determine if differences in proportions between two groups were significant, two-sided Fisher’s exact test was used. Two-sided p values of less than 0.05 were considered significant. Multiple testing correction was carried out using the Benjamini-Hochberg false discovery rate of 5%.42 We grouped the French isolates into four vaccine periods as previously described by Ouldali et al.8: 1) targeted PCV7 period (2003-2005) in which PCV7 only reimbursed and recommended for children in a day-care centre with ≥2 other children, children in families with >2 children, or children breastfed for fewer than 2 months; 2) generalised PCV7 period (2006- 2010) in which PCV7 were applied to all children younger than 2 years; 3) early PCV13 period (2011-2014) and 4) late PCV13 period (2015-2018) in which PCV7 was replaced with PCV13, without catch up. Using these four periods, we compared the prevalence of GPSCs. Statistical tests were performed in R version 3.6.0.
Role of funding source
The funders of the study had no role in the study design, data collection, data analysis, data interpretation, or writing of the report. The corresponding author had full access to all the data in the study and had final responsibility for the decision to submit for publication.
Results
The 419 French serotype 24F pneumococcal isolates were sequenced and all passed sequencing quality control criteria. The collection was delineated mainly to 3 lineages GPSC10 (CC230, 41.5%), GPSC6 (CC156, 38.2%), and GPSC16 (CC66, 17.7%) and a small percentage of isolates were other lineages: GPSC44 (CC177, 1.4%), GPSC18 (CC15, 0.7%) and GPSC5 (CC172, 0.5%). Over the study period, clonal replacement was observed in overall disease, meningitis and carriage isolates (Figure 1). Overall, GPSC16 showed a significant decrease since the generalised PCV7 period (2006-2010). In contrast, GPSC6 significantly increased from 9.9% in the generalised PCV7 period to 62.7% in the early PCV13 period (2011-2014). Whilst GPSC6 decreased to 30.2% in the late PCV13 period (2015-2018), GPSC10 significantly increased from 23% in the early PCV13 period to 65.1% in the late PCV13 period (p value <0.0001 for all changes, Table S1). In the late PCV13 period, GPSC10 became the predominant lineage, accounting for 100% (10/10) of the pneumonia cases caused by serotype 24F pneumococcus, 74% (35/47) of meningitis cases, 67% (4/6) of other infections, 50% (11/22) of bacteraemias and 60% (50/84) of asymptomatic colonisation. In contrast to the pan-susceptible GPSC16, the replacement lineages GPSC6 and GPSC10 were cotrimoxazole-resistant and MDR, respectively.
(A-C) Proportion of serotype 24F pneumococcal lineages from France between 2003 and 2018 and (D) predominant antibiotic resistance profiles of each lineage. GPSC, global pneumococcal sequence cluster; PCV, pneumococcal conjugate vaccine; Pen, penicillin; Tax, cefotaxime; Chl, chloramphenicol; Ery, erythromycin; Tet, tetracycline.
Compared with other serotype 24F lineages, only GPSC10 was more frequently detected in overall invasive disease (OR:1.38, 95%CI: 0.93-2.04, p=0.107) and meningitis (OR:1.57, 95% CI:0.98-2.51, p=0.06) cases than in asymptomatic colonisation. Although the finding did not reach statistical significance, it suggested that this lineage had relatively high invasive disease potential and propensity to cause meningitis (Figure S2 and Table S2). In contrast, GPSC6 was more frequently identified in carriage than in meningitis isolates (OR:0.58, 95%CI:0.35- 0.97, p=0.038).
The global collection of 642 serotype 24F isolates revealed that this serotype was widely distributed across 29 countries (Figure S3). We delineated the global collection of 642 serotype 24F isolates into 20 GPSCs. GPSC10, 16, 150 and 206 were the most common lineages, accounting for 68% (439/642) of the overall GPS collection and 78% (96/123) of a sub-collection includes isolates randomly selected from disease surveillance systems and carriage surveys from 21 countries (Figure S3). GPSC10, 16 and 206 are globally-spreading lineages, while GPSC150 is only detected in West Africa, except for three isolates from Israel. Among these most common lineages, only GPSC10 was MDR with a majority of isolates exhibiting resistance to penicillin, cotrimoxazole, erythromycin and tetracycline (Figure S4).
Phylogenetic analysis of the serotype 24F cps revealed a strong clonal but not geographical structure (Figure S5). The cps belonging to the same GPSC were clustered together regardless of the isolates’ country of origin. This finding indicated that after a single capsular switching to 24F, the serotype variants clonally disseminated across different geographical regions majorly at the genetic background of GPSC10, 16 and 206. Only a few capsular switching events were observed based on the high similarity in cps between lineages. For example, three GPSC18 isolates from France share highly similar cps with GPSC10, suggesting a capsular switching between GPSC10 and GPSC18. GPSC18 expressing serotype 24F was not detected elsewhere but in France according to two largest pneumococcal isolate databases, the GPS project and pubMLST database (last accessed on 12th August 2021). GPSC18-24F was first detected in 2011 in this collection of isolates, a year after the introduction of PCV13 in France, without significant expansion (Figure 1). GPSC18 was a MDR lineage exhibiting resistance to penicillin, cotrimoxazole and erythromycin and found to cause both invasive disease and asymptomatic colonisation.
Of all serotype 24F lineages, GPSC10 was the only one detected with high invasive disease potential and multidrug resistance. It was responsible for the increase of serotype 24F in France, and one of the major lineages mediating the global spread of serotype 24F. We further investigated this lineage from a global perspective using an international collection of 888 GPSC10 isolates. This lineage was detected in 33 different countries across Africa, Asia, Europe, North and South America (Figure S6). It expressed 17 different serotypes, including 3, 6A, 6C, 7B, 10A, 11A, 13, 14, 15B, 15C, 17F, 19A, 19F, 23A, 23B, 23F and 24F. Only 6 of them (serotypes 3, 6A, 14, 19A, 19F and 23F) were covered by the current PCVs (PCV10/13) that were approved for children use. An additional 3 serotypes (10A, 11A 15B) are covered by Pfizer’s planned 20-valent vaccine, and 4 (10A, 11A 15B, 17F) by Merck’s planned 24- valent vaccine. The latter two were not approved for children use at the time of writing. At present, not a single vaccine is known to cover all 17 serotypes expressed by GPSC10. Concerningly, GPSC10 is among the top 5 lineages in India,43 Pakistan and Nepal (Table S4), where pneumococcal disease burden is highest.1 In these three countries, 15 serotype variants were detected in GPSC10 and only 6 were included in PCV13, underlining its potential to cause serotype replacement in the future. Internationally, GPSC10 was consistently found to express multiple serotypes and be multidrug-resistant.
The global GPSC10 phylogeny showed that all serotype 24F isolates but one were clustered together, regardless of their country of origin (Figure 2A). This finding is consistent with the 24F cps phylogeny, which indicated the global dissemination of GPSC10-24F was largely mediated by clonal spread rather than capsular switching. We analysed a collection of 91 GPSC10 isolates from Spain and detected a rapid change in serotype composition from serotype 19A to 24F after the implementation of PCV13 in the target group of children (Figure S7). The 19A and 24F serotype variants from Spain were separated in long branches on the global phylogeny, further indicating that the capsular switch predates the vaccine introduction and coexistence of both GPSC10 serotype variants in Spain.
(A) Global phylogeny of GPSC10 and (B) time-resolved phylogeny of a cluster of GPSC10-24F Streptococcus pneumoniae. All but one serotype 24F isolates were clustered together, regardless of their country of origin, indicating a clonal spread of GPSC10-24F across the world. The GPSC10 global phylogeny can be interactively visualised at https://microreact.org/project/global_GPSC10/21948517 and time-resolved phylogeny at https://microreact.org/project/GPSC10-24F/8735223a
A time-resolved phylogeny was built on the cluster of 276 GPSC10-24F isolates and revealed four sub-clusters: EU-clade-I (dominated by ST4253), EU-clade-II (ST230 and ST4677), EU- clade-III (ST230) and American (ST230) clade (Figure 2B). EU-clade-I and -II were estimated to emerge around the 1990s and then clonally expanded. EU-clade-III was relatively small and estimated to emerge around late 1990s to early 2000s, with a majority of isolates from France and one from Qatar. Throughout the study period, EU-clade-I accounts for most (70%, 112/160) of the French GPSC10-24F isolates and drove the 24F increase while most of Spanish isolates (88%, 58/66) belonged to EU-clade II (Figure S8). In the EU-clade-I, long- range transmission was observed from Europe to Israel in the late 2000s. In the EU-clade-II, multiple transmissions from Europe to Morocco were detected and a single transmission from Europe to India in the early 2000s was followed by a clonal expansion.
We reconstructed the population dynamics over time on EU-clade-I and EU-clade-II using the Bayesian skyline model. The EU-clade-I was predicted to have three exponential increases in effective population size in around 1995, 2004 and 2013, while EU-clade-II had one in around 2010 (Figure 3A and 3B). The most recent increase in EU-clade-I coincided with the observed prevalence increase of GPSC10-24F in both overall disease and carriage isolates from France (Figure 3A, 3C-D). Spatiotemporal analysis of the 174 GPSC10-24F isolates from France indicated that a pair of pneumococcal isolates was more likely to be recovered within the same province if they were <5 years diverged from their MRCA (p=0.0083). Pairs which had diverged 5 or more years ago had a risk ratio stably surrounding 1 and no downward trend (p=0.4449). This suggests that the GPSC10-24F population was homogenous across French provinces after 4 years (Figure 4 and Table S3).
Bayesian skyline plots of estimated median effective population size of EU-clade-I and EU-clade II of GPSC10-24F Streptococcus pneumoniae over time (A and B), observed prevalence of GPSC10-24F S. pneumoniae from overall invasive disease and carriage in France over the collection year (C and D). The figure demonstrates three and one exponential increases in effective population size in EU-clade-I and EU-clade-II, respectively. The most recent increase in EU-clade-I coincided with the observed prevalence increase of GPSC10-24F in both overall disease and carriage isolates in France.
Spatiotemporal analysis of GPSC10-24F sub-lineage from France. A pairwise odds ratio was calculated for two samples being diverged from time-to-most-recent-common- ancestor (tMRCA) and recovered from the same French province. Odds ratio higher than one indicates that a pair of isolates is more likely to be recovered within the same French province. Pairs diverged 5 or more years ago had an odd ∼ 1 without up- or downward trend, indicating an equal chance to recover the sample pair within and between French provinces.
Discussion
Our results showed the emergence of a virulent and MDR pneumococcal lineage GPSC10 that was responsible for the increase of invasive pneumococcal disease in France and the global spread of the invasive serotype 24F, which is not included in the current or expected PCVs. Due to its recombinogenic nature,44 this lineage is capable of simultaneously expressing a wide range of serotypes to facilitate its adaptation under the vaccine-selective pressure. Together with its transmissibility, GPSC10 should therefore be regarded as a high- risk lineage that could diminish the benefits of the vaccination programme worldwide over time.
Over the two decades since the advent of PCV, GPSC10 has mediated serotype replacement in multiple countries. After the introduction of PCV7/10, 19A became the predominant serotype causing invasive disease in Europe, such increase was partially driven by GPSC10, together with GPSC1 (CC320).45–50 In France and Spain, the major contributor of serotype 19A was GPSC10 in the post-PCV7 period.50, 51 The use of PCV13 (which targets 19A) effectively reduced serotype 19A but with a concurrent increase in serotype 24F in Europe.8, 15, 22, 27, 52 Using the Spanish collection, we observed a rapid change in serotype composition within GPSC10 from 19A to 24F after PCV13. A similar serotype change within GPSC10 was also observed in Argentina10 and Israel53 after PCV13, demonstrating that the pre-existence of serotype variants enables GPSC10 to survive and expand under the vaccine-selective pressure.
Serotype 24F pneumococci were infrequent causes of invasive disease in Europe in the 1980s54 and the earliest GPSC10 lineage was detected in Denmark 1996 expressing serotype 14.55 GPSC10-24F was first reported from three adult patients in Naples, Italy between 1997- 199856; these findings coincide with our model prediction of the emergence of this clone in Europe. Since then, GPSC10-24F started to be more frequently detected in children8, 45 and adults57, 58 from Southern Europe. In spite of the geographical proximity between France and Spain, the increase in serotype 24F was mainly driven by two different clones, GPSC10 EU- clade-I and EU-clade-II, respectively. This finding may suggest trans-border transmission is present, but may not be as frequent as transmission within a country, resulting in evolution of two clones in parallel.
Although the global spread of serotype 24F is largely due to the clonal spread of three pneumococcal lineages (GPSC10, 16 and 206), lineage that drove the serotype 24F increase differs between countries. This variation could be partially explained by the differences in antibiotic-selective pressure. For example, the 24F driver lineage in Denmark was GPSC6.12 Unlike GPSC10, GPSC6 was susceptible to penicillin and erythromycin. This observation coincided with the lower consumption of penicillin and macrolide (a class of antibiotic includes erythromycin) in Denmark, as compared with other countries such as France8, 9, Lebanon17 and Spain19, 20 where serotype 24F increase was mediated by the multidrug-resistant GPSC10 (Table S5). The latter three countries consumed 1.8-2.2 times more penicillin and 1.2-2.0 times more macrolide than that in Denmark. GPSC10 also mediated the serotype 24F increase in Argentina10 where macrolide consumption was 1.2 times higher than Denmark, though penicillin consumption was almost similar. The high consumption of penicillin and/or erythromycin in Argentina, France, Lebanon and Spain potentially selected for GPSC10 while serotype replacement in low antibiotic consumption settings was mainly observed to be mediated by susceptible lineages.12, 59 Among countries with serotype 24F increase, Japan has the highest consumption of erythromycin and the least of penicillin, the serotype 24F driver lineage GPSC106 (CC2572) only exhibits erythromycin resistance.16 These observations suggested that vaccine- and antibiotic-selective pressure are shaping the post-vaccine population structure, and gradually the reduction in antibiotic resistance achieved by PCVs may diminish.53 However, resistance alone was not necessary for clonal success, as GPSC10-24F was detected in all six countries but was only expanded in high antibiotic use settings. This finding echoed the observation in other bacterial species such as Escherichia coli.60 As acquired antibiotic-resistant genes are part of the accessory genome (genes not present in all isolates of a species), their frequencies are under negative-frequency dependent selection,61 which may explain the coexistence of susceptible and resistant strains in the pneumococcal population.
The strength of this study lies in a large global collection of pneumococcal genomes, along with a comprehensive epidemiological metadata. Although the sampling strategy and collection time frame was not consistent between countries, the GPS project has been formulated to create pre- and post-PCV dataset in each participating country to evaluate the impact of PCV on pneumococcal population. This overarching study is complemented with the knowledge we gained from a number of country-specific analyses to provide an international perspective of GPSC10 and to highlight this lineage as a future risk in pneumococcal disease prevention. This study also underlines the need for policymakers to evaluate the overall impact of PCVs including changes in IPD incidence and detection of emerging non-vaccine serotypes. Focusing only on vaccine serotypes in estimating impact could be misleading as it would identify a rise in serotype 19A, observed in some settings with PCV10 use, but would not detect the increase in serotype 24F, associated with PCV13 use, as noted in the current study. Knowledge gained through implementation of an effective and sustainable surveillance system for S. pneumoniae could guide timely policy making including choice of PCV to stabilise the incidence of IPD cases at a low level or even further reduce incidence.
Our work further shows the usefulness of bacterial genome sequencing to better understand the pneumococcal lineages behind the serotype changes and reveals that GPSC10 alone is a challenge for a serotype-based vaccine strategy. More systematic investigation to identify lineages like GPSC10 will better inform and improve next-generation preventive strategies against pneumococcal disease.
Data Availability
All data produced in the present study are available upon reasonable request to the authors
The Global Pneumococcal Sequencing Project Consortium
Abdullah W Brooks, Alejandra Corso, Alexander Davydov, Alison Maguire, Anmol Kiran, Benild Moiane, Bernard Beall, Chunjiang Zhao, David Aanensen, Dean B Everett, Diego Faccone, Ebenezer Foster-Nyarko, Ebrima Bojang, Ekaterina Egorova, Elena Voropaeva, Eric Sampane-Donkor, Ewa Sadowy, Geetha Nagaraj, Helio Mucavele, Houria Belabbès, Idrissa Diawara, Jennifer Verani, Jeremy Keenan, John A Lees, Jyothish N Nair Thulasee Bhai, Kedibone Ndlangisa, Khalid Zerouali, Leon Bentley, Leonid Titov, Linda De Gouveia, Maaike Alaerts, Margaret Ip, Maria Cristina de Cunto Brandileone, Md Hasanuzzaman, Metka Paragi, Michele Nurse-Lucas, Mignon du Plessis, Mushal Ali, Nicholas Croucher, Nicole Wolter, Noga Givon-Lavi, Nurit Porat, Özgen Köseoglu Eser, Pak Leung Ho, Patrick Eberechi Akpaka, Paula Gagetti, Peggy-Estelle Tientcheu, Pierra Law, Rachel Benisty, Rafal Mostowy, Roly Malaker, Samanta Cristine Grassi Almeida, Sanjay Doiphode, Shabir A. Madhi, Shamala Devi Sekaran, Stuart C Clarke, Somporn Srifuengfung, Susan A Nzenze, Tamara Kastrin, Theresa J. Ochoa, Waleria Hryniewicz, Yulia Urban
Research in context
Evidence before this study
We searched PubMed using the terms “streptococcus pneumoniae” AND “24F” OR “CC230” OR “GPSC10” for papers published in English between Jan 1 2000 and Feb 19 2021. We searched for population-based studies which reported changes in serotype 24F before and after the introduction of pneumococcal conjugate vaccine in the country or region. After reviewing 59 articles, 28 met the inclusion criteria. The effects of 7-valent PCV were measured in 6 studies and 13-valent PCV (PCV13) in 23 studies. The majority of studies utilised samples from children and/or adults with IPD (n=20), four studies included isolates from both carriage and IPD cases and four studies analysed isolates from carriage alone.
Studies were conducted at the national or regional level, and typed isolates using Quellung and/or latex agglutination and/or PCR based methods. Amongst IPD cases in children, 24F was identified as the predominant serotype post PCV13 in eight studies representing cases from France, Denmark, Spain, Italy and Japan. For the three studies with serotype stratified by child age, the predominance of 24F was only observed in children up to 5 years of age. Serotype 24F was the second most common serotype, or jointly the most common with 12F, in four further studies of IPD cases post PCV13 in Germany, France and the UK. An increase in 24F post PCV7 was reported amongst IPD cases from Spain, Italy and France, carriage and IPD cases in Norway and carriage in Portugal.
Added value of this study
We have an enhanced understanding of the multidrug resistant lineage of S. pneumoniae which has driven the increase in serotype 24F in France post PCV13. Additionally, we utilised a global collection of S. pneumoniae isolates from 56 countries to contextualise the isolates from France, identifying the predominant lineage (GPSC10) associated with the increase of multidrug resistance in serotype 24F globally. This study is complemented with the knowledge we gained from a number of country-specific analyses to demonstrate GPSC10 may pose a global threat after PCV13 due to a high risk of vaccine evasion.
Implications of all the available evidence
The increase in serotype 24F post PCV13 in France was attributed to the multidrug resistant lineage GPSC10. Concerningly, GPSC10 has a relatively high disease potential and propensity to cause meningitis independent of serotype. GPSC10 appears to be highly capable of acquiring DNA that may result in antimicrobial resistance and serotype switches. Analyses of GPSC10 isolates from a global dataset of S. pneumoniae genomes have identified expression of an additional16 serotypes of which only six are included in PCV13. Antimicrobial use may have contributed to selection of GPSC10 in France and Spain, decreasing the benefit of PCV for reduction of AMR. GPSC10 has transmitted amongst European countries, with long-range transmissions to other continents. The evidence suggests that the expansion of GPSC10 may be a challenging problem to address using a serotype-based vaccine strategy.
A collection of serotype 24F Streptococcus pneumoniae from France 2003-2018 by (A) clinical sample source, (B) age and (C) year of collection. The collection indicates an almost 1:1 ratio of samples from invasive disease (cerebrospinal fluid, blood, and others) and asymptomatic colonisation (nasopharyngeal swab) overall and over the year. Majority of samples are from children aged 2 and under.
Prevalence of pneumococcal lineages by clinical manifestations and odds ratio for causing (A) overall invasive diseases and (B) meningitis by reference to carriage. The odds ratio and 95% confidence interval were calculated using Fisher’s Exact test.
(A)The geographical distribution of serotype 24F Streptococcus pneumoniae (n=642) in the Global Pneumococcal Sequencing (GPS) project database, including 419 isolates from France. (B) Proportion of pneumococcal lineages or Global Pneumococcal Sequence Clusters (GPSCs) in an overall collection of serotype 24F pneumococci (n=642) and a sub-collection (n=123) includes isolates randomly selected from disease surveillance systems and carriage surveys. Pneumococcal lineages less than 3% in prevalence are grouped as others in the pie charts. The geographical distribution can be interactively visualised at https://microreact.org/project/global_24F/7d36573f.
Phylogeny of 642 serotype 24F Streptococcus pneumoniae from 29 countries across six continents overlaid with antibiotic resistance profiles. GPSC, global pneumococcal sequence cluster; CC, clonal complex; PEN, penicillin; AMO, amoxicillin; MER, meropenem; TAX, cefotaxime; CFT, ceftriaxone; CFX, cefuroxime; ERY, erythromycin; COT, cotrimoxazole; TET, tetracycline; CHL, chloramphenicol; MDR, multidrug resistance. This figure can be interactively visualised at https://microreact.org/project/global_24F/e1acf229.
A phylogeny built upon the genetic variants identified from the capsular encoding region (cps) in a collection of 642 serotype 24F Streptococcus pneumoniae and overlaid with Global Pneumococcal Sequence Clusters (GPSCs)
Geographical distribution of Global Pneumococcal Sequence Cluster (GPSC)10 (n=888) from 33 countries. This figure can be interactively viewed at https://microreact.org/project/global_GPSC10/21948517
Rapid changes in serotype composition within Global Pneumococcal Sequence Cluster (GPSC)10 during PCV introductions among 91 isolates from Spain. Serotype 19A is targeted by PCV13 but not PCV10. Serotype 24F is not targeted by either.
The proportion of Global Pneumococcal Sequencing Cluster (GPSC)10 clades and other GPSCs in serotype 24F Streptococcus pneumoniae isolates causing invasive pneumococcal disease from France (A) and Spain (B).
Prevalence of GPSCs in serotype 24F Streptococcus pneumoniae causing invasive disease (n=190) and asymptomatic colonisation (n=229) in France over vaccine periods.
Odds ratio for invasiveness and propensity to cause meningitis of six pneumococcal lineages expressing serotype 24F from France.
A pairwise odds ratio for two samples being diverged from time-to-most-recent- common-ancestor (tMRCA) and recovered from the same French province.
The prevalence, serotypes and resistance profile of GPSC10 by country in the Global Pneumococcal Sequencing (GPS) database
The relationship between pneumococcal 24F driver lineages and antibiotic consumption.
Supplementary files 2
The -80°C stock of each S. pneumoniae isolate was plated on an agar plate with 5% sheep blood and incubated overnight at 37°C in 5% CO2. A single colony from the overnight culture was inoculated in 5ml Todd Hewitt broth at 37°C in 5% CO2 overnight. The bacterial pellet from the overnight broth culture was then subject to DNA extraction. Pneumococcal DNA was extracted using a modified protocol of QIAamp1DNAMini Kit (QIAGEN, IncValencia, CA) protocol as previously described 1. The DNA quantity was evaluated by Qubit and then subject to sequencing on an Illumina HiSeq platform at Wellcome Sanger Institute, generating ≥100bp paired-end reads. The reads were assembled and annotated as previously described.2 Quality control of the genome sequences was as follow: 1) overall sequencing depth >20X, 2) >60% reads mapping to Streptococcus pneumoniae using Kraken, 3) >60% mapping coverage of reference genome (PMEN global clone Spain23F-1, accession number FM211187) 4) percent of heterozygous sites over total number of single nucleotide polymorphisms (SNPs) ≤ 15%, 5) total number of contigs <500 and 6) total length of the assembled genome size between 1.9- 2.3 Mb. Serotypes were predicted from the sequence reads using SeroBA.3 At the time of writing, SeroBA cannot differentiate serotypes within serogroup 24. Therefore, serogroup 24 isolates (n=674) in the Global Pneumococcal Sequencing (GPS) project, including those from France, identified by SeroBA were subject to phylogenetic analysis. Reads of serogroup 24 genomes were mapped to the reference sequence of 24F capsular encoding region cps (CR931688) using Burrows Wheeler Aligner (BWA) version 0.7.17-r1188.4 The alignment was then further aligned with reference sequences of 24A (CR931686) and 24B (CR931687), followed by extracting SNPs using snp-sites.5 A maximum likelihood tree using FastTree version 2.1.106 with GTR substitution model was constructed and overlaid with phenotypic serotyping results if available. We identified 25 isolates clustered with 24A reference, 94 with serotype 24F reference, 549 in a group that 250 isolates were confirmed as 24F by the Quellung reaction in eight different laboratories, and two divergents. No isolate’s cps was clustered with serotype 24B reference. The 642 isolates predicted to be serotype 24F were included for further analysis.
Acknowledgement
The study was co-funded by the Bill and Melinda Gates Foundation (grant code OPP1034556) and the Wellcome Sanger Institute (core Wellcome grants 098051 and 206194). Particular thanks go to all members of the Global Pneumococcal Sequencing (GPS) Consortium for their contributions of sample collection, processing and collaborative spirit. We are grateful for feedback from Dr Adam Cohen and Dr Xin Liu from the Centers for Disease Control and Prevention and Dr Chrispin Chaguza from Yale University. We acknowledge Dr Corinne Lévy, Dr Naïm Ouldali, and Stéphane Béchet (ACTIV) who manage children data collection and have help in establishing the study sample for France. We are also grateful to Assiya El Mniai and Cécile Culeux (French NRL for pneumococci) for their technical assistance in preparing DNA for whole genome sequencing. We acknowledge the support from the sequencing facility and the Pathogen Informatics team at the Wellcome Sanger Institute The findings and conclusions detailed in this manuscript are those of the authors and do not necessarily represent the official position of the Centers for Disease Control and Prevention. For the purpose of open access, the author has applied a CC BY public copyright license to any Author Accepted Manuscript version arising from this submission.