Abstract
We undertook a genome-wide association study of susceptibility to invasive group A streptococcal (GAS) disease combining data from distinct clinical manifestations and ancestral populations. Amongst other signals, we identified a susceptibility locus located 18kb from PAX5, an essential B-cell gene, which conferred a nearly two-fold increased risk of disease (rs1176842, odds ratio 1.8, 95% confidence intervals 1.5-2.3, P=3.2×10−7). While further studies are needed, this locus could plausibly explain some inter-individual differences in antibody-mediated immunity to GAS, perhaps providing insight into the effects of intravenous immunoglobulin in streptococcal toxic shock.
Main
Invasive group A streptococcal (GAS) disease is defined by the isolation of Streptococcus pyogenes from a normally sterile site. While various manifestations can occur, infections of the skin and soft tissues are most frequent. Relative to other infections, invasive GAS is associated with a high case fatality rate[1], with an estimated 163,000 deaths attributable to the condition each year worldwide.[2]
Over time the majority of the population experience some degree of exposure to GAS yet invasive infections are rare.[3] Moreover, while the majority of invasive infection occur at the extremes of age, it remains unclear why a minority of otherwise healthy children and young adults develop devastating invasive infections.[4] While several pathogen and environmental factors are implicated, a role for host genetic variation is certainly plausible given the heritability of other GAS diseases, including rheumatic fever[5] and recurrent pharyngitis.[6] Indeed, host genetic variation in the human leukocyte antigen (HLA) locus has previously been linked to susceptibility to invasive GAS disease,[7,8] perhaps reflecting preferential binding of certain alleles to GAS superantigens.[8-10] Additionally, a number of reports have now emerged linking host genetic variation elsewhere in the genome to susceptibility to invasive bacterial disease more broadly.[11-13] Accordingly, we undertook a genome-wide association study (GWAS) of susceptibility to invasive GAS disease, combining the results of four case-control studies covering distinct clinical manifestations and ancestral populations.
This study used genotyping data from several projects, brought together to form four case-control analyses (Suppl. Figure 1). Ethical approval was granted to each project by the relevant institutional authorities.
In the first, cases were children and adults of European ancestry with GAS-associated necrotising fasciitis (NF) (n=34) or other manifestations of invasive GAS (n= 9) treated at hospitals across the UK. We have previously investigated the relationship between the variants in HLA locus and risk of invasive GAS disease amongst these individuals finding the DQA1*01:03 associated with susceptibility.[14] Individuals aged 65 years or more or with co-morbidity were excluded. The youngest patient was 18 months of age and oldest was aged 63 years (median 35 years). In the second analysis, cases were children of European ancestry with GAS empyema thoracis (n=36) recruited from hospitals across the UK. The youngest patient was aged four months and the oldest 17 years (median 4 years). None had comorbidities or established immunodeficiency.
Both sets of cases were genotyped using the HumanCore platform (Illumina Inc., USA) or the Global Screening Array (Illumina) specifically for this analysis. Controls for both analyses were healthy adolescents of European ancestry recruited to vaccine studies based in Oxfordshire (n=1540) and had previously been genome-wide genotyped using the HumanOmniExpressExome platform (Illumina).
In the third analysis, cases were children of East African ancestry with GAS bacteraemia (n=66) recruited at a single rural hospital in Kilifi, Kenya, as part of a broader study of childhood bacteraemia in this setting. [15] The youngest patient was aged one day and the oldest nine years (median 162 days). Controls for this analysis were children recruited to a birth cohort set in the area surrounding that hospital (n=1689). Both cases and controls had been genotyped previously by the Wellcome Trust Case Control Consortium 2 using the Affymetrix 6.0 platform (Affymetrix, USA).[12,13]
In the fourth analysis, cases were children and adults without comorbidity of Oceanian ancestry with various invasive GAS phenotypes, including bacteraemia without source (n=7), soft tissue infection (n=3) and meningitis (n=2), treated at hospitals in New Caledonia (NC), a French territory in the South Pacific. The youngest patient was aged two months and the oldest patient 24 years (median three years). Controls for this analysis were healthy adult volunteers reporting Oceanian ancestry recruited in NC to a previously reported study of rheumatic heart disease (n=193).[16] Both cases and controls were genotyped using the HumanCore platform (Illumina).
Quality control was undertaken using standard approaches[17] but with an additional step to minimise confounding due to the use of multiple genotyping platforms (Suppl. Figure 1 & 2).[18] Crucially, after these steps, the resulting case-control datasets were well matched for genetic ancestry (Suppl. Fig. 3). Next, we randomly selected ten controls for each case carrying forward 161 cases and 1610 controls for analysis, equating to an effective sample size of 585 individuals. We then performed genotype imputation, run separately for the UK, Kenyan and NC datasets, before running case-control association analyses in each dataset using linear mixed models implemented in GCTA software[19] (v1.26.0) with no evidence of residual confounding based on the distribution of test statistics (Suppl. Figure 4). We then performed genome-wide meta-analysis using Metasoft software[20] (v2.0.1) for 4,234,532 variants present at minor allele frequency greater than 5% in all four datasets. Further analyses including estimation of effect sizes by transformation[21] were performed in R (v3.0). Finally, in the absence of a further dataset in which to confirm or refute our findings, we designed a three-step process to identify the most robust association signals that would be suitable for further investigation (Suppl. Fig. 5). This involved an initial screening genome-wide meta-analysis under the standard fixed-effects model followed by further analysis of putatively associated loci using the more stringent random-effects model to assess heterogeneity. Taking forward genomic regions containing at least one variant at suggestive significance in the random-effects meta-analysis (P<10−5), we performed a sensitivity analysis in which we reran the random selection of controls 100 times over before repeating the individual case-control analyses and rerunning the fixed-effects meta-analysis, prioritising those loci at which the association with susceptibility showed the greatest consistency.
In our initial genome-wide screen, we identified one genomic region that contained a variant associated with susceptibility at genome-wide significance (PFE<5×10−8) and 24 genomic regions that contained at least one variant associated at suggestive significance (PFE<10−5) (Suppl. Table 1; Suppl. Figure 5a). Thirteen of these regions also contained at least one variant at suggestive significance in the random-effects meta-analysis (Suppl. Table 2; Suppl. Figure 5b). In our sensitivity analysis, we found only five regions (nine variants) had no PFE values greater than 1×10−4 while only two regions (five variants), which we took forward for further analysis, had no PFE values greater than 5×10−5 (Suppl. Table 3).
The first of two regions shortlisted in the sensitivity analysis (maximum PFE<5×10−5) was on chromosome 7 in an intron of the MAGI2 gene. The lead variant (rs35089407, PFE=5.3×10−8) was well supported by surrounding variants (Figure 1a). The effect was strongest in the NC analysis where the minor allele (thymidine) was associated with a three-fold increased risk of disease declining to a 1.5-fold increase in the UK invasive disease analysis (combined odds ratio, OR, 2.25, 95% confidence interval, CI, 1.68– 3.02; Figure 1b). The overall minor allele frequency was 14.7% (range 7.5-21.9%) while the imputation accuracy was high (information metric, 0.96–0.97).
The second region was on chromosome 9 located 18kb from the PAX5 gene in the intron of a non-coding long RNA (termed AL450267.2). The peak compromised four neighbouring variants (PFE=3.2×10−7–7.3×10−7; Figure 1c) of which one was at suggestive significance in the random-effects analysis (rs1176844, PRE=7.6×10−6). The minor allele of rs1176844 (thymidine) was associated with an almost two-fold increased risk of the disease in the combined analysis (combined OR 1.9, 95% CI 1.4– 2.4; Figure 1d). Notably this variant, which had a minor allele frequency of 43.3%, had been directly genotyped in both the UK and NC datasets while the imputation accuracy in the Kenyan dataset was very high (information metric, 0.997).
We next investigated overlap between the five shortlisted variants in these two loci and known or predicted regulatory elements (Suppl. Table 3) using the RegulomeDB tool.[22] Overall evidence of regulatory function was minimal with nothing to suggest altered transcription factor binding or gene expression. However, rs1327501 within the PAX5 signal overlapped histone modifications consistent with enhancer elements found specifically in analyses of primary hematopoietic stem cells and primary B cells from cord blood but no other cell types. Additionally, the other three PAX5 variants overlapped an area of weak transcriptional activity in primary hematopoietic stem cells in short term culture but no other cell types (Suppl. Figure 6).
Finally, we reviewed the human leukocyte antigen (HLA) locus in light of our earlier report of an association between the DQA1*01:03 allele and susceptibility in the UK invasive disease component of our dataset. [14] While we did not perform HLA imputation for this analysis, it is noteworthy that the two missense variants that define the DQA1*01:03 allele (i.e. rs36219699 and rs41547417) were excluded from this analysis because their minor allele frequency was less than 5% in the Kenyan dataset. However, neither were associated with susceptibility in the UK empyema or NC analyses (P=0.50–0.65). Despite this, the strongest signal in the HLA locus mapped to within 1kb of HLA-DQA1 (rs17612669, OR 1.5, 95% 1.1–2.1, PFE=0.005). While this signal is difficult to interpret in the context of a GWAS, it is noteworthy that it showed negligible heterogeneity despite the diverse genetic ancestry of the study population.
DISCUSSION
This preliminary investigation of susceptibility to invasive GAS disease provides the first evidence that common host genetic variants may influence the risk of this infection across a range of populations, age groups and sub-phenotypes.
The signal located 18kb from PAX5 is of particular interest because this gene is widely considered a key regulator of B cell maturation and fundamental to the genetic processes that lead to antibody production.[23] This is relevant because deficient antibody-mediated immunity to key GAS virulence factors has long been proposed as a key determinant of invasive GAS susceptibility in the general population. [24,25] Indeed, antibody-mediated neutralisation of virulence factors including superantigens may drive the protective effects of intravenous immunoglobulin, which, when administered in streptococcal toxic syndrome, was recently shown to be associated with a two-fold reduction in mortality at 30 days.[26] Moreover, the overlap with putative regulatory elements in haemopoietic stem cells, while far from definitive in the absence of experimental validation, raises the possibility these variants somehow impact differentiation of haemopoietic stem cells into B cell precursors, which might have downstream effects on antibody-mediated immunity.
In contrast, the MAGI2 signal is harder to relate to invasive GAS disease. MAGI2 plays a fundamental role in maintenance of synapses, endocytosis and postendocytic trafficking. [27] The gene is large, stretching 1.4Mb, and it is of interest that the locus has previously been to linked to infantile spasms. [28] Additionally, the gene has been linked by GWAS to traits including blood lipid and bone metabolism phenotypes[29] but overall it remains unclear how this locus could influence susceptibility to invasive GAS disease.
This study has a number of limitations, not least of which is the small sample size relative to most modern genetic association studies. Consequently, our analysis has limited power to detect all but the strongest signals while our estimates of effect size are imprecise and should be interpreted with caution. While follow-up of these preliminary findings is a vital next step, it is important to emphasise that invasive GAS is a rare disease and sporadic which renders recruitment challenging. Indeed, despite our focus on children and young adults, this report is, to our knowledge, the largest study of genetic susceptibility to invasive GAS to date.
Moreover, we deliberately set out to identify variants influencing susceptibility to GAS across a range of clinical manifestations and ancestral populations, as a consequence of which our study largely ignores the heterogeneous nature of this disease. Indeed, such heterogeneity might explain our findings in the HLA locus since it is possible the DQA1*01:03 allele effects predisposition to only a subset of clinical phenotypes or is dependent on the presence of specific bacterial factors such superantigens.[14] Future larger-scale prospective studies with access to more precise clinical data coupled with information about the pathogen should help clarify this issue. Nonetheless, our use of data from three distinct ancestral populations has substantial advantages, not least of which is improved detection of underlying causal variants.[30]
Overall, this is the first time a GWAS of susceptibility to invasive GAS has been undertaken. While the small sample size necessitates caution, these results suggest common genetic variation in the PAX5 locus, as well as potentially elsewhere in the genome, impacts susceptibility to invasive GAS disease. While follow-up of our findings in independent datasets is warranted, this analysis underscores the potential of investigating genetic susceptibility to GAS disease to bring about an improved understanding of pathogenesis.
Data Availability
Several components of the data referred to in the manuscript are available through the European Genome-phenome Archive including: the UK Invasive GAS study data (accession number EGAS0000100342); the Kenyan Bacteraemia Study Group data (EGAS00001001756); the Pacific Islands Rheumatic Heart Disease Genetics Network data (EGAS00001001881). Other components will be submitted to the European Genome-phenome Archive over the coming months and/or for the most part be available from the authors on reasonable request.
https://www.ebi.ac.uk/ega/studies/EGAS00001003421
Acknowledgments
This research was supported by grants awarded to T.P. from the Medical Research Council (G1100449), the European Society of Clinical Microbiology & Infectious Diseases (Research Grant 2013), the National Institute for Health Research (ACF-2016-20-001) and the British Heart Foundation (PG/14/26/30509). In addition, the genotyping undertaken by the Kenyan Bacteraemia Study Group was funded as part of the Wellcome Trust Case Control Consortium 2 (084716/Z/08/Z, 085475/B/08/Z and 085475/Z/08/Z). The genotyping undertaken by the Oxford Vaccine Group as part of EUCLIDS was funded through the European Union’s Seventh Framework Programme (EC-GA no. 279185). A.J.M. was supported by a Wellcome Trust Clinical PhD Fellowship (106289/Z/14/Z) while A.J.M., S.J.C., D.O. and A.J.P. were supported by the National Institute for Health Research Oxford Biomedical Research Centre, Oxford, UK. Additionally, M.B. was supported by a Medical Research Council Clinician Scientist Fellowship (MR/M008797/1), J.J.G. was supported by a Wellcome Trust Clinical PhD Fellowship (102342/Z/13/Z), J.A.G.S. was supported by a Wellcome Trust Senior Research Fellowship (098532), T.N.W. was supported by a Wellcome Trust Senior Research Fellowship (091758), J.C.K. was supported by a Wellcome Trust Investigator Award (204969/Z/16/Z) and A.V.S.H. was supported by a Wellcome Trust Senior Investigator Award (HCUZZ0) and by a European Research Council Advanced Grant (294557). S.S. was supported by the National Institute for Health Research Health Protection Research Unit in Healthcare Associated Infections and Antimicrobial Resistance, Imperial College London, London, UK.
We thank the Oxford Genomics Centre at the Wellcome Centre for Human Genetics for generating the genotyping data, subsidized by Wellcome Trust Core Awards (090532/Z/09/Z and 203141/Z/16/Z). We also thank the Oxford Biomedical Research Computing facility, a joint development between the Wellcome Centre for Human Genetics and the Big Data Institute supported by Health Data Research UK and the National Institute for Health Research Oxford Biomedical Research Centre with funding from the Wellcome Trust Core Award Grant Number (203141/Z/16/Z). We also acknowledge the support of the National Institute for Health Research Oxford Biomedical Research Centre, Oxford, the National Institute for Health Research Imperial Biomedical Research Centre, London, UK, and the National Institute for Health Research Newcastle Biomedical Research Centre, Newcastle upon Tyne, UK. None of these funders had any role in study design, data collection and analysis, decision to publish or preparation of the manuscript. Moreover, the views expressed here are those of the authors and not necessarily those of the National Health Service, the National Institute for Health Research or the Department of Health.