Abstract
Background Rheumatoid arthritis (RA) is a chronic inflammatory autoimmune disease associated with reduced life expectancy. It is heritable and an extensive repertoire of genetic variants have been identified. The gut microbiota may represent an environmental risk factor for RA. Indeed, Prevotella copri is a candidate keystone species, but whether it lies on the causal pathway for disease or is simply a bystander reflecting host-genetic predisposition to RA, remains to be determined. The study of disease-microbiota associations may be confounded by the presence of the disease of interest or by its treatment. To circumvent this issue, we assessed whether known RA risk alleles were associated with the gut microbiota, in a large who do not have RA.
Methods Blood and stool acquired from volunteers from TwinsUK were used for genotyping and assessment of the gut microbiota, respectively. A weighted polygenic risk score (PRS) for RA was calculated in 1650 unaffected twins from the TwinsUK cohort, based on 233 published GWAS-identified published RA associated single nucleotide polymorphisms (SNPs). Amplicon sequence variants (ASVs) were generated from 16S rRNA sequencing of stool samples and assessed for association with RA PRS. Confirmation of findings was performed using an independent sample comprised of first-degree relatives of RA patients from the SCREEN-RA cohort (n=133).
Findings We found that Prevotella spp was positively associated with RA PRS in TwinsUK participants (p.adj<1e-7). This finding was validated in SCREEN-RA participants carrying the shared epitope risk alleles (p.adj=1.12e-3). An association of Prevotella spp. with pre-clinical RA phases was also demonstrated (p.adj=0.021).
Interpretation Prevotella in the gut microbiota is associated with RA genotype in the absence of RA, as well as in subjects at high risk of developing RA. This work suggests that host genotype is associated with microbiota profile prior to disease onset.
Evidence before this study?Prevotella copri is increased within the gut microbiota in rheumatoid arthritis (RA) patients, predominantly those with early disease, before treatment is initiated. Prevotellaceae (family) has been demonstrated to be increased in pre-clinical RA cases compared to controls. Prevotella copri is posited to be an inflammatory driver, contributing to RA pathology by promoting a pro-inflammatory cytokine milieu.
What does this study add?This study is the first to demonstrate that, in a large human cohort, carrying genetic risk factors for RA is associated with a higher abundance of Prevotella spp. in the absence of RA.
How might this impact on clinical practice or future developments This work suggests that any potential causal role of Prevotella spp. occurs very early in disease development. Strategies to intervene should target early or asymptomatic at-risk groups
Introduction
Rheumatoid arthritis (RA) is a debilitating chronic autoimmune condition, associated with reduced life expectancy. There is a substantial genetic component to RA aetiology, with heritability estimated at 65%.1 Known environmental risk factors include periodontal disease, tobacco smoking and diet, and appear to trigger disease onset in genetically susceptible individuals.2 More recently proposed RA risk factors include the mucosal commensal microbiota. There is extensive cross-talk between the microbiota and the host, starting early in life with the development of the normal immune system; the microbiota may be implicated in the development of RA.
The gut lumen holds the vast majority of the commensal microbiota, and has intimate proximity to both the immune system via the gut associated lymphoid tissue, and the systemic circulation. A key gut microbiota taxonomic association demonstrated in RA patients, which presents early in the course of RA is an abundance of Prevotella spp, and particularly Prevotella copri (P.copri).3,4 Further evidence supporting the pathophysiological relevance of P.copri in RA is provided by a positive correlation with clinical parameters in RA patients.4 In addition to promoting disease activity, the gut microbiota may also influence patient response to treatment.4,5 The gut microbiota therefore represents a potential therapeutic target in RA, both for modulation of disease and for improving response to established therapeutics.
Identifying how RA genetic and environmental risk factors interact with one another may shed light on the underlying biology of disease. The influence of host genetic factors on the microbiota in RA remains relatively unexplored. An important influence is highly plausible: host genetics shape the biochemical and immune environment in which the microbiota reside, and furthermore the cumulative influence of the genetic risk loci in RA is predominantly mediated by immune pathways.2 Whilst there are a number of factors which influence microbiota composition, host genetic factors account for a considerable proportion of variance, with some taxa being 40% heritable.6
The purpose of this study was to investigate the influence of host genetic factors on the gut microbiota in RA in the absence of disease. The use of a population sample TwinsUK,7 whilst excluding RA affected participants allows for isolation of RA genetic factors and gets around the important issues of confounding by RA disease and its treatment - thereby achieving a human model of the genetic association with the microbiota in RA. RA polygenic risk score in unaffected TwinsUK participants was calculated and association determined with composition of the gut microbiota. Validation studies confirmed and extended our findings by examining the gut microbiota composition in first-degree relatives of RA patients (SCREEN-RA cohort) in relation to shared epitope positivity, and a pre-disease state as defined by clinical RA parameters.8
Methods
Participants: TwinsUK
Participants recruited to the study were members of the TwinsUK cohort, the largest UK registry of adult twins,7 with the exception that participants having a diagnosis of RA (identified by questionnaire, follow-up telephone and clinical visit confirmation) as well as their unaffected co-twin were excluded. TwinsUK participants comprised 93% females, having a median age of 63 (Table 1).
Polygenic Risk Score for RA in TwinsUK
Blood samples from TwinsUK participants obtained at clinical visit were used to determine genotype using the Illumina HumanHap300 BeadChip and the Illumina HumanHap610 QuadChip. Non-genotyped variants were imputed using 1000 Genomes and Haplotype Reference Consortium (HRC) reference panels.7 The polygenic risk score (PRS) assigns an individual a single numerical value for risk of disease conferred by genetic factors.9
The NCBI database of GWAS summary statistics for RA was used to identify 233 published single nucleotide polymorphisms (SNPs) associated with RA at genome-wide significance (p=5e-8) (Supplementary Table 1), of which 117 had been replicated across studies (Supplementary Table 2).10 A study inclusion criterion of European ancestry ensured ethnic concordance with TwinsUK.7,11 The PRS was tested for RA predictive value in 6,776 participants from UK Biobank including 2,686 RA cases and 4,090 controls. Diagnosis of RA in UK Biobank participants had been determined using hospital episode statistics (HES) data supplied by NHS Digital. Logistic regression of RA cases and unselected controls against PRS, adjusting for age, sex and smoking history, was applied. Standardised coefficients are reported.
Risk allele dosage of SNPs present within TwinsUK was extracted using Plink 1.9. Of the SNPs identified in the literature 227 were available in TwinsUK. Pruning was applied to account for linkage disequilibrium 12 Missing allele dosages were imputed and replaced with the mean value across the respective SNPs. The risk allele dosage was multiplied by the SNP-RA association effect size, to produce a weighted PRS.9
High PRS individuals in our dataset did not demonstrate RA seropositivity: only 9/500 individuals in the TwinsUK sample with available serum were anti-citrullinated protein antibody (ACPA positive; defined as >5u/mL), distributed similarly between high and low genetic risk groups (high 2:93; low 7:398).
Microbiota profiling: TwinsUK
Microbiota composition of gut (stool, n=1650) samples was assessed using the 16S rRNA marker gene, with sequencing of the V4 variable region using barcoded primers (F515/R806). Samples were processed as previously described.6 Briefly, faecal samples were collected during clinical visits or were posted in sealed ice packs and frozen on arrival at the lab at −80° C. Stool samples were sent as 35 batches on dry ice to Cornell University, where DNA was extracted and sequenced on an Illumina MiSeq platform.6
16S sequences were demultiplexed in QIIME. Amplicon sequence variants (ASVs) were generated using the DADA2 package in R13 using the pipeline described below, applied separately for each sequencing run (n = 35), until the final merge step.14 Sequences were trimmed, error estimated within the forward and reverse reads for each sample, and the ASV algorithm applied to infer the biological sequence. Forward and reverse ASVs were joined. Chimeras were removed and the total dataset merged, followed by taxonomic assignment with SILVA1.3.2.14 Samples having a sequencing depth of less than 10,000 reads were excluded. A phylogenetic tree was generated using the Phangorn R package. Alpha diversity was calculated on untrimmed ASV tables using four measures: Shannon index, Simpson index, observed ASVs and Faith’s phylogenetic diversity.
Association of the gut microbiota with genetic risk of RA
Linear mixed-effects models were used to determine association between RA PRS and alpha diversity, using alpha diversity as a response variable to the RA PRS. Modelling was performed using the lme4 package in R,15 with fixed effect covariates age, BMI and sequencing depth and technical covariates as random effects. Standardised coefficients were reported. Differential abundance of ASVs present in more than 5% of samples at the genus level against the RA PRS was assessed using the DESeq2 R package,16 using fixed-effects covariates as above. Genus level refers to grouping together all ASVs with an identical genus annotation. Taxonomic assignment using the SILVA database resulted in some instances in grouping ASVs at the genus level, which are predicted to be one of a group of species. These annotations were preserved when grouping at genus level. False discovery rate (FDR) adjustment for multiple testing was applied to all models.
Phylogenetic and community relationship within the Prevotellaceae Family
To investigate further the Prevotella spp. associations identified, the phylogenetic and community relationships between Prevotella_7 and Prevotella_9 were investigated. To examine community relationships, ASVs were grouped into compositional clusters, or balances. Briefly, ASVs present in at least 10 individuals were correlated and transformed as implemented in Morton et al.17 This creates clades in which interacting species are closer neighbours in a clade than loosely related ones. The phylogenetic tree was subsetted to Prevotellaceae and visualised using the Phyloseq R package.18
Abundance of Prevotella spp. in RA discordant twin pairs
To be confident that the Prevotella spp findings in non-RA individuals with genetic risk for RA were not reflecting a “resilience-associated” microbiota (i.e. protective factors against developing disease despite having high genetic risk), we calculated the abundance of these species in 19 monozygotic RA discordant female twin pairs from TwinsUK, which had been excluded from the previous analysis because of the presence of disease. If associated with resilience we would expect the Prevotella spp to be preferentially expressed in the unaffected twin of each discordant pair. RA diagnosis had been confirmed during a clinical visit to a Rheumatologist or RA-trained research assistant. ASVs were generated as above and relative abundance of Prevotella spp. in RA discordant twin pairs calculated using the Phyloseq R package18 (Figure 6).
Participants: SCREEN-RA
To validate the results from TwinsUK, an analysis of RA genetic predisposition and microbiota association was performed in participants of SCREEN-RA, a Swiss multicentre cohort of first-degree relatives of RA patients.8 Within SCREEN-RA participants (n=133), pre-clinical RA cases (n = 83) were identified based on the European League Against Rheumatism terminology for pre-clinical phases of RA,8 and matched with 50 controls who were also first degree relatives of RA patients. Briefly, pre-clinical RA was detedefined on the basis of serum positivity for anti-citrullinated protein antibody (ACPA) or rheumatoid factor and/or symptoms and signs associated with possible RA with or without undifferentiated arthritis.8 Patients with a diagnosis of RA were excluded from the analysis and all participants were examined by a trained nurse. Of SCREEN-RA participants, 84% were female and median age was 57 (Table 2).
Shared Epitope Genotyping: SCREEN- RA cohort
Uncoagulated blood was taken from patients and controls and stored frozen at –20° until DNA extraction. DNA was extracted using a modification of the salt-out technique (Nucleon TM, Scotlab, UK). HLA-DRB1 polymorphism was determined by reverse polymerase chain reaction - sequence-specific oligonucleotide primers hybridization and by polymerase chain reaction - sequence-specific primers using commercial reagents validated at the National Reference Laboratory for Histocompatibility. Within the DRB1*04 genotype the method discriminates all major subtypes in different allele groups. HLA-DR1, DR4 and DR14 alleles that are negative for the SE70-74 motif are also discriminated. In a second step, the SE-positive typing ambiguities were analyzed by PCR-SSP in order to determine the final 4-digit typing result.
Microbiota Profiling and Statistical Analysis: SCREEN-RA cohort
Gut microbiota 16S rRNA data was collected as previously described.8 Briefly, the DNA Genotek OMNIgene·Gut Stool Microbiome Kit was used to collect, store and ship the stool samples. After sample processing and DNA extraction, the V4 region of the 16S rRNA gene was amplified using barcoded primers (F515/R806), and sequenced on an Illumina MiSeq, with ASVs generated as per the TwinsUK cohort to ensure compatibility. Differential abundance of taxa in the gut microbiota in association with pre-clinical RA and SE positivity was assessed against all genus present in greater than 5% of samples using the DESeq2 R package.16 Biological covariates were not required to adjust for as these factors were matched between the case and control groups.
Results
TwinsUK participants comprised 1650 adult twin volunteers the majority of whom were female. In order to consider genetic risk factors isolated from the presence of disease, the exclusion criteria were as follows: diagnosis of RA, or twin sibling with a diagnosis of RA. Table 1 summarises the characteristics of the sample.
Polygenic Risk Score
The polygenic risk score was normally distributed in the TwinsUK sample with RA cases and twin siblings excluded (Shapiro-Wilk normality test P= 0.24). The score ranged from 12.6-71.3; the median score was 42.63, the interquartile range 38.89 – 48.23. Application of the PRS in 6,776 UK biobank participants, of which 2,686 had a HES diagnosis of RA, confirmed that the score is predictive of RA. The odds ratio (OR) for the PRS was modest but highly statistically significant:OR = 1.34; p = 4.17e-8).
Genetic risk for RA and alpha diversity
We determined whether genetic risk of RA is associated with alpha (within sample) diversity. There was no detectable association between RA PRS and Shannon Index, Simpson Index, Observed ASVs or Faith’s phylogenetic diversity (Table 3).
RA PRS and the gut microbiota
The gut microbiota was assessed for association with RA PRS following a non-targeted approach at the genus level. Of all microbial taxa present in more than 5% of stool samples, Prevotella was the strongest association and showed an 18-fold log base 2 higher differential abundance with genetic risk for RA (P.adj < 1e-7).
SCREEN-RA Cohort: association with pre-clinical RA and shared epitope alleles
We sought to confirm our findings in gut microbiota from the SCREEN-RA cohort by analysing the association between the main genetic risk factor for RA - HLA-DRB1 shared epitope (SE), and the intestinal microbiota. As the cohort comprises first-degree relatives of RA patients it has a higher genetic risk for RA than the general population. Participants had also been genotyped for SE risk alleles.
Prevotella (genus) was positively associated with pre-clinical RA (p.adj = 0.021; Figure 2). Prevotella_9 is listed as potential P.copri at 97% similarity within the SILVA database.14 Prevotella (genus) was associated with HLA-DRB1 SE RA risk alleles in the SCREEN-RA cohort (n=133; p.adj = (0.0348) and more strongly so in a subgroup analysis in which the 44 participants with swollen joints were removed, in order to isolate genotype from RA pathophysiology (n=89; p.adj=1.12e-3), Figure 3. In the subgroup analysis of differential abundance against SE positivity in asymptomatic participants only, Prevotella_7 was the only remaining taxon association. This finding validates the TwinsUK Prevotella_7 association.
Phylogenetic and community relationship within Prevotellaceae
As Prevotella spp. are the strongest taxon associations in our results and of particular interest in RA, we further investigated the phylogenetic and community relationships within Prevotellaceae. In particular, we were interested in the relationship between Prevotella_7 and Prevotella_9 as these were associated with genetic risk and pre-clinical RA, respectively. Cluster analysis demonstrated a community relationship between Prevotella spp. As exact matches between ASV sequences and reference species were not made, the SILVA database facilitated annotation at the species level according to groups of similar species ASV sequence.14 Using this database annotation at 97% similarity, it is inferred that Prevotella_9 potentially represents Prevotella copri, whilst Prevotella_7 is a different species of uncertain annotation. Prevotella_7 and Prevotella_9 were the only 2 species clusters within the Prevotellaceae family, and clustered together more frequently than any other group, suggesting an inter-dependent community relationship between these taxa (Figure 4). Visualisation of the Prevotellaceae phylogenetic tree demonstrated that Prevotella_7 and Prevotella_9 are phylogenetically distinct from one another (Figure 5).
Abundance of Prevotella spp. in RA discordant twin pairs
We sought to investigate the potential role of Prevotella_7 in RA. The Prevotella_7 associations demonstrated may be implicated in RA pathogenesis, and may, indeed, be increased in RA patients. Conversely, given that the RA-unaffected TwinsUK participants are beyond the mean age of RA onset, yet have high genetic risk, these participants may be resilient to RA. Correspondingly, the genetic risk and microbiota associations demonstrated may be microbial markers of RA resilience. In this instance, the ratio of Prevotella_7:Prevotella_9 (copri) would be expected to be higher in unaffected twins.
The relative abundance of Prevotella spp. was calculated in 18 monozygotic female RA-discordant twin pairs excluded from our previous analysis. A higher abundance of Prevotella_7 was demonstrated in the RA affected twins compared to control twins: 0.005 and 0.001, respectively, supporting the hypothesis that Prevotella_7 is implicated in RA aetiology. The difference in abundance however was not statistically significant. The proportion of Prevotella_7 (species prediction uncertain) to Prevotella_9 (predicted Prevotella copri) was higher in the RA affected twins than the control twins: 0.4 and 0.01, respectively, suggesting that Prevotella_7 is not a marker of RA resilience.
Discussion
This study demonstrates a link between host genetic risk for RA and the gut microbiota in a large human cohort. Gut microbiota associations have been demonstrated in the absence of clinically detectable disease because participants with RA, along with their unaffected co-twins, were removed from the study. The strongest association with genetic risk for RA in the gut microbiota was an increase in abundance of Prevotella_7 (18-fold log-change; p.adj<1e-7), and the Prevotella_7 association was confirmed by our analysis of the SCREEN-RA cohort in which this taxon was positively correlated with shared epitope positivity (7-fold log change; p.adj=1.12e-3). Prevotella_7 was demonsrated to share a biological interdependence and niche similarity with Prevotella_9 (copri).
There is considerable interest in Prevotella copri as a potential mediator of RA pathology, and it is a candidate keystone taxon enriched in the gut microbiota of newly diagnosed RA patients. Since this observation was made, the human immune response to this microbe has been of considerable interest. Antibodies to P.copri have been shown to associate with disease severity and innate- and Th1 and Th17 immune responses in RA4,19.
Functional work has suggested a role for P.copri in Th17 cell differentiation.20 That the abundance of P.copri abates after treatment with DMARDS, as shown in a longitudinal study, 3 and is also associated with other inflammatory conditions,21 has fuelled speculation that inflammation is a pre-requisite for P.copri proliferation within the gut, relative to other taxa.19,22 P.copri may have adapted to thrive in a pro-inflammatory environment, and may further promote the inflammatory milieu, thereby enhancing its own favoured environmental niche.22 In doing so, P.copri is suggested to contribute to RA pathology. According to this model, a human–microbiota interspecies positive feedback loop is proposed. Another example of such a model is the hijacking of complement cascade by Porphyromonas gingivalis.23
An increase in abundance of three other taxa - Ruminococaceae, Rikenella and Shigella - in association with RA PRS was also observed. Ruminococcaceae and Rikenella have been shown previously to be in higher abundance in RA patients compared to controls.3,24 The evidence for a role of these taxa in RA is much weaker than for P.copri - with lower replication across studies, and no functional link reported to date. However, the association both with RA patients and with genetic risk for RA in a large, non-disease cohort is interesting, and merits further investigation.
Whilst causality is not determinable from a cross sectional association study, our results provide robust evidence to indicate that genetic factors in RA may regulate the abundance of Prevotella seen in the gut microbiota of people with RA. Both TwinsUK and SCREEN RA cohorts were balanced for ACPA positivity in relation to RA genetic risk loci. A previous study of new onset RA patients and healthy participants reported an inverse association of HLA genotype with abundance of Prevotella copri, in the opposite direction to that expected.4,25 This study used operational taxonomic units (OTUs) and potential confounding factors were not examined; it is not possible accurately to determine species from OTUs. The present study using ASVs in a very large sample provides more substantial evidence for an association of genotype with the RA gut microbiota.
The findings suggest that the microbiota is altered prior to disease and are in accord with previous reports of Prevotella spp. in pre-RA8 and RA. Indeed, we find similar in cases of pre-clinical RA in first degree relatives of RA patients. In a recent study of the SCREEN-RA cohort an enrichment of Prevotellaceae in the gut microbiota in pre-RA was shown. In the study by Alpizaar-Rodriguez et al.8 no statistically significant Prevotellaceae genera were shown to drive the association. Therefore in the present study data were re-analysed using the more recent method of ASVs. ASVs offer an updated alternative to traditional clustering based methods generating OTUs from 16S sequence. As opposed to grouping sequences based on a similarity threshold as for OTUs, it is now possible to use the error rate to infer the original biological sequence, and produce units of matched sequence. ASVs offer higher resolution, therefore, and have greater biological relevance.26 The ASV analysis showed an FDR-adjusted significant Prevotella genus association with a species level group annotation indicating potential Prevotella copri. Our follow-up study of pre-RA further revealed six novel genus associations which had not been evident in the original OTUs– Lactobacillus, Enterococcus, Faecalibacterium, Ruminococcaceae, Veilllonella and Butyrivibrio. Of the six associations, the first four have been reported associated with RA previously,3,27,28 and Ruminococcaceae was seen associated with RA PRS in TwinsUK.
There are a number of limitations to the study. First, in the pre-clinical RA follow-up analysis both cases and controls were first degree relatives of RA patients, with increased genetic risk compared to the general population. It is possible that the pre-clinical RA cases had higher genetic risk than the controls, but as full genotyping was not available, we were unable to confirm this. The TwinsUK cohort is predominantly female – 93% of participants in our study were female. The SCREEN-RA cohort is slightly more balanced in terms of sex, however population prevalence of RA is much higher in females than males. As the age of onset of RA is 30-65,29 the TwinsUK participants are less likely to develop disease than the SCREEN-RA participants. However, this is of benefit to the design of our study of genetically high risk yet unaffected individuals and helps us to understand microbiota differences in the absence of disease.
On a technical note, this work highlights the value of using updated methods of amplicon sequence variants – ASVs - which can detect taxonomic variation overlooked by OTU based methods: there are likely to be further microbiota associations with RA as yet undetected due to limitations in 16S methods. Annotation of ASVs with the SILVA database demonstrates differences at the species level which would be overlooked using other reference databases. Finally, modelling of the community relationship provides valuable insight into the underlying biology. Future studies should take advantage of these methods.
Taken together, these results support the hypothesis that microbiota is altered in individuals with genetic predisposition to RA before the onset even of pre-clinical RA. The work sheds light on our understanding of microbiota in RA and addresses the cause vs consequence issue; – if microbial alteration precedes disease, the microbiota may lie on the causal pathway. However we cannot yet exclude the possibility that Prevotella spp. are bystanders. Additionally, having RA genetic risk and Prevotella spp. is not sufficient for disease development, but rather may be one of several “hits” required for progression to RA. The identification of pre-clinical RA represents an important clinical target in early disease intervention and is currently the subject of multiple immune-modulating clinical trials. The microbiota may offer the opportunity for modulation of pre-disease pathways.
Conclusions
Gut microbiota abundance is associated with RA risk genotype even in the absence of disease. Genotype may mediate key taxonomic associations of the gut microbiota with RA, particularly Prevotella spp.3,4,30 suggesting that theses species play a role very early in development of RA.
Data Availability
Data is available from TwinsUK.
Conflict of Interest Statement
The authors declare no condlict of interest
Author Contributions
CS, FW, BK and AC conceived the project. PW developed the theory, performed the experiments, analysed and interpreted the data, and took the lead in writing the manuscript. CS and FW supervised the analysis. CS encouraged development of the work through leading collaboration with the SCREEN-RA cohort, and inclusion of investigation of community structure, and abundance of taxa in RA discordant twins. AA performed analysis of the community structure. FW, PW and CS collected data from TwinsUK. MF assisted with UKBiobank genotype data. All authors provided ctitical feedback and contributed to the final manuscript.
Acknowledgements and affiliations
This work was funded by Arthritis Research UK Special Strategic Award (grant 21227). Twins UK receives funding from the Wellcome Trust; the National Institute for Health Research (NIHR) Clinical Research Facility at Guy’s & St Thomas’ NHS Foundation Trust and NIHR Biomedical Research Centre based at Guy’s and St Thomas’ NHS Foundation Trust and King’s College London. Claire J. Steves is funded by the Wellcome Trust (grant WT081878MA). Philippa Wells is supported by a joint PhD studentship from the Medical Research Council (MRC) and Clinical Disease Reasearch Foundation (CDRF).
The authors would like thank the participants of the TwinsUK cohort.