ABSTRACT
Individuals growing up during childhood in a socioeconomically disadvantaged family experience a higher rate of inflammation-related diseases later in life. Little is known about the mechanisms linking early life experiences to the functioning of the immune system decades later. Here we explore the relationship across social-to-biological layers of early life social exposures on levels of adulthood inflammation (C-reactive protein) and the mediating role of gene regulatory mechanisms, epigenetic and transcriptomic profiling from blood, in 2,329 individuals from two European cohort studies. Consistently across both studies, we find transcriptional activity explains a substantive proportion (up to 78%) of the estimated effect of early life disadvantaged social exposures on levels of adulthood inflammation. Furthermore, we show that mechanisms other than DNA methylation potentially regulate those transcriptional fingerprints. These results further our understanding of social-to-biological transitions by pinpointing the role of pro-inflammatory genes regulation that cannot fully be explained by differential DNA methylation.
INTRODUCTION
Inflammatory processes have been suggested as a potential pathophysiological mechanism underlying the development of several major chronic diseases, including cardiovascular, metabolic and psychotic disorders1,2,3. Systemic inflammation is also one of the most studied pathways in the context of the social-to-biological transition4,5,6,7,8. Although early life disadvantaged socioeconomic conditions have been consistently shown to lead to elevated levels of inflammation in adulthood9, the underlying biological processes through which those socioeconomic exposures are embodied remain unclear.
Various studies have suggested a dysregulation of pro-inflammatory genes transcription in leukocytes as a potential biological mechanism for the embodiment of socioeconomic disadvantage10,11,12,13,14. This line of reaserch rests on the hypothesis that socioeconomic disadvantage activates the sympathetic nervous system associated to body’s flight or fight response to stressors that eventually leave a leukocyte transcriptional fingerprint. Similar findings have been reported in experiments conducted with macaques15,16. Other studies have focused on the epigenetic regulators of transcriptional activity in leukocytes, in particular the DNA methylation levels at specific CpG sites17,15,18. These research endeavors were conducted without directly relating DNA methylation to gene transcription levels, whose functional relationship does not always hold19,20.
To date, the contribution of pro-inflammatory genes regulation to socioeconomic inequalities in inflammatory levels has not been investigated empirically, although frequently discussed theoretically10,17. To explore the chain of biological mechanisms linking early life socioeconomic disadvantage to adult inflammatory levels via DNA methylation and gene transcription in leukocytes, we used data from a Finnish (1,623 participants) and a Swiss (706 participants) population-based cohort study, and evaluated the consistency of findings across these two populations. First, we calculated the portion of the effect of early life socioeconomic disadvantage on adult heightened inflammatory levels explained by transcriptional level of pro-inflammatory genes either inferred by applying 2-sample Mendelian randomization21 on the whole transcriptome (2sMR) or belonging to the conserved transcriptional response to adversity (CTRA) model12. We measured levels of systemic inflammation through circulating C-reactive protein (CRP), as it is a sensitive marker of inflammation and elevated amounts have been associated with increased risk for several diseases3. Second, we established the most likely functional relationship between cis (based on their genomic location) epigenetic and transcriptional changes induced by early life socioeconomic disadvantage via Bayesian networks model selection22.
RESULTS
Pro-inflammatory genes selection via 2sMR
We used summary statistics of the associations between genetic instruments (Single Nucleotide Polymorphisms, SNPs) and gene transcription in leukocytes or CRP in blood obtained via large-scale GWAS conducted by the eQTLGen consortium23 and the CHARGE consortium3, respectively (see Methods). There were 10,701 genes for which 2sMR provided a test of the causal association between gene transcription and CRP levels. There was an average of 39 semi-independent instruments (cis eQTLs in a linkage disequilibrium r2≤0.25, see Methods) per each gene and the average F-statistic was 97. Finally, 81 genes were significant at q≤0.05 (see Table S1 and Methods). Of note, one of them, NFKB1, was as well one of the CTRA genes. Functional enrichment analysis revealed enriched inflammatory bowel disease, influenza A and prion disease pathways.
Characteristics of populations
As reported in Table 1, on average Finnish participants (YFS study) were 42-year old (age ranging between 34 and 49 years), while Swiss participants (SKIPOGH study) were 53-year old (age ranging between 25 and 88 years). Women represented 46% of the YFS population and 52% of the SKIPOGH population. CRP levels in SKIPOGH were distributed differently than in YFS, as participants in the upper quintile had CRP≥3 mg/L in SKIPOGH while had CRP>2.2 mg/L in YFS (see Table 1). Individuals with high/low parental occupational position were 18%/26% in YFS and 25%/28% in SKIPOGH.
Mediation by genes transcription
Individuals with low parental occupational position (potentially counter to fact) had an increase in (the geometric mean of) adulthood CRP of 22.6% [95% confidence intervals (CI): 2.7%, 46.4%] for YFS and 65.1% [95% CI: 32.7%,106.9%] for SKIPOGH compared to having (potentially counter to fact) high parental occupational position (total effect, see Figure 1). The point estimate of the geometric mean of CRP for individuals with low parental occupational position was 0.75 mg/L in YFS and 0.52 mg/L in SKIPOGH, while for individuals with high parental occupational position was 0.61 mg/L in YFS and 0.32 mg/L in SKIPOGH.
The portion of the total effect accounted by 2sMR-implicated genes en-bloc (indirect effect, whereby 2sMR genes transcription levels were summarized by the first 23 principal components (PCs) corresponding to 49.9% of transcription variance, see Methods) resulted in an increased (geometric mean of) CRP of 15.1% [95% CI: 6.7%, 24.5%] in YFS, and of 12.7% [95% CI: 1.0%,24.1%] in SKIPOGH (first 23 PCs corresponding to 50.4% of transcription variance). This corresponded to a proportion mediated (PM, see Methods) of 69.0% [95% CI: 29.4%, 96.5%] in YFS and of 23.9% [95% CI: 2.3%, 47.2%] in SKIPOGH. The indirect effect accounted by CTRA genes en-bloc (first 12 PCs corresponding to 49.3% of transcription variance) resulted in an increased (geometric mean of) CRP of 8.8% [95% CI: 1.9%,16.7%] in YFS, and of 10.8% [95% CI: 1.3%,20.2%] in SKIPOGH (first 10 PCs corresponding to 49.8% of transcription variance). The corresponding PM was 41.5% [95% CI: 10.3%, 88.7%] in YFS and 20.5% [95% CI: 2.8%, 39.6%] in SKIPOGH. When considering 2sMR-implicated and CTRA genes together (133 in total, see Table S1), their joint indirect effect resulted in an increased (geometric mean of) CRP of 17.2% [95% CI: 7.1%,28.5%] in YFS (first 33 PCs corresponding to 50.1% of transcription variance) and of 13.9% [95% CI: 0.0%,27.2%] in SKIPOGH (first 30 PCs corresponding to 49.8% of transcription variance). This corresponded to a PM of 77.8% [95% CI: 32.4%, 97.3%] in YFS and of 25.9% [95% CI: 0.02%, 50.3%] in SKIPOGH. The partial redundancy between the two gene sets was supported by correlation patterns between the principal components and between the transcription levels (see Figures S1 and S2).
Overall, in both cohorts the magnitude of effects was in the same direction and that of indirect effects was similar, with estimates of 17.2% and 13.9% indicating that pro-inflammatory genes transcription mediates a substantive portion of the estimated total effect of parental occupational position on adult CRP.
Regulation of genes transcription by local DNA methylation
We applied Bayesian network scoring to 4,076 CpGs located nearby (<1500 bp) the 2sMR-implicated and CTRA genes (see Table S2), for five causal structures (Group A to E, see Figure S5 and Methods). As described in Table 2, for most CpGs (Group A: N=3,612 (88.6%) and N=3,253 (79.8%) in YFS and SKIPOGH, respectively), there was no substantive evidence (logarithm of Bayes factor larger than one, see Methods) for an influence of parental occupational position on adult regulation of genes transcription. For one CpG (cg19527571), there was evidence for the role of DNA methylation as cis-regulator of MED24 in SKIPOGH only (Group B). There was evidence for parental occupational position driving levels of gene’s transcription and DNA methylation being either unrelated to transcription (Group C: N=30 and N=36 CpGs in YFS and SKIPOGH, respectively) or driven by transcription (Group D: N=11 and N=17 CpGs in YFS and SKIPOGH, respectively). For 9 and 24 CpGs (Group E, YFS and SKIPOGH, respectively) there was evidence for parental occupational position driving levels of methylation but not transcription levels. Finally, there was no substantive evidence for one causal structure over the others in N=414 (10.2%) and N=745 (18.3%) CpGs in YFS and SKIPOGH, respectively.
Overall, a non-empty set of CpGs was present across the two cohort studies for Groups A, C, D and E. Furthermore, this consistency at pattern level did correspond to a replication at the level of single site for 2,894 CpGs belonging to Group A only (see Table S2).
Compared to CpGs in Group A, sites in either Group C or D (N=41 and N=53 in CRYFS and SKIPOGH, respectively) had similar distribution in CpG island position (island or shore or shelf vs open sea, χ2=0.1 and 0.2, P value=0.76 and 0.48, %, YFS and SKIPOGH, respectively) and increased TSS or UTR annotation (56% vs 44% and 64% vs 44%, χ2=2.8 and 7.9, P value=0.15 and 0.005, YFS and SKIPOGH, respectively).
Sensitivity analyses
Total and indirect effect estimates were in the same direction and of similar (for YFS) and higher (for SKIPOGH) magnitude to those reported in main analysis when excluding participants with high values of CRP (>10 mg/L, N=34 [2.1%] in YFS and N=12 [1.7%] in SKIPOGH) (see Figure S3). By varying the amount of variance explained by principal components summarizing transcription levels of the genes’ sets, indirect effect estimates slightly increased when variance moved from 30% to 50% and remained stable when variance moved from 50% to 60% (see Table S3). This finding supports the fact that indirect effects reported in main analyses do not underestimate the amount of mediation by pro-inflammatory genes regulation. In SKIPOGH, when using a dichotomous CRP (≥3 mg/L) as outcome, the estimated indirect effect was substantive (see Table S4), backing the results presented in main analyses with an imputed continuous CRP.
When parental education was considered as the exposure, the magnitude of indirect effects was in the same direction and slightly lower than that reported in main analysis (see Figure S4). Furthermore, the Bayesian scoring procedure supported a pattern of models similar to the one found for parental occupational position, but no consistent evidence for models whereby parental education drives transcription which then affects DNA methylation levels (see Table S5). When adding a group of negative control DAGs to the groups described in Figure S5, the Bayesian scoring procedure correctly did not assign that group to any of the CpGs.
DISCUSSION
We found that both Finnish and Swiss adults having experienced disadvantaged socioeconomic conditions early in life displayed heightened levels of systemic inflammation compared to their more advantaged counterparts, and that a substantive portion of this effect was transmitted by shifts in leukocytes pro-inflammatory genes transcription. Via 2sMR, we identified a new set of pro-inflammatory genes putatively influenced by socioeconomic conditions in early life. Finally, we showed a consistent pattern of inter-individual variation in local DNA methylation being either unrelated to or driven by inter-individual variation in transcription.
To our knowledge, this is the first study examining regulatory mechanisms linking early life socioeconomic conditions to inflammatory levels in adulthood by integrating epigenetic and transcriptomic profiling from blood in two European populations. The observed socioeconomic inequalities in CRP (that is the estimated total effect) were in line with a previous multi-cohort study (N=13,078) from four European countries6, whereby the pooled association between paternal occupational position and CRP resulted in an increased (geometric mean of) CRP of 20.9% [95% CI: 11.6%, 31.0%]. In our study, some differences across the two cohorts were observed in the estimated magnitude of total and indirect effects. As populations were sampled using different designs, were from different European countries and had a different age distribution, these characteristics may have influenced those inter-cohort differences. Remarkably, in both populations those effects were in the same direction and the magnitudes of mediation were similar. Indeed, the joint indirect effect estimates were 17.2% and 13.9% (2sMR+CTRA genes in YFS and SKIPOGH, respectively), indicating a substantive mediation. However, the magnitude of the indirect effects can be an overestimate since unmeasured confounding may have increased the association between genes transcription and heightened inflammation. Furthermore, another limitation of our study is that CRP was not measured through a high-sensitivity assay in SKIPOGH. However, main and sensitivity analysis provided consistent results, and two thirds of SKIPOGH participants had CRP measured with high precision at 2 mg/L (coefficient of variation<5%). Overall, our finding of a mediating role of pro-inflammatory genes regulation is supported by previous studies in humans and primates. Those research endeavours supported transcriptional fingerprint of decreased glucocorticoid and increased pro-inflammatory signalling as key mechanism in the embodiment of socioeconomic disadvantage10,15.
Our transcriptome-wide analysis also identified 81 pro-inflammatory genes via 2sMR, a method that is protected from confounding and reverse causation biases. By using summary statistics from two large data sets we increased statistical power, and by performing inference via three 2sMR methods based on orthogonal assumptions we strengthened the reliability of inference. Nevertheless, we cannot exclude bias in our 2sMR analysis due to selection of instruments in the same data set where we conducted inference25, which leads to underestimation of causal effect sizes. Furthermore, we applied a stringent, rather conservative threshold for genes’ selection, and we could not apply 2sMR to all human genes, therefore we cannot exclude the existence of other undetected pro-inflammatory genes.
We showed in both Finnish and Swiss individuals that early in life socioeconomic conditions may regulate gene transcription in leukocytes without involving local DNA methylation changes, or with local DNA methylation playing the role of regulated event. This finding is in line with recent experimental observations obtained via a novel sequencing technology19, Bayesian network analyses of ancestry-related differences in DNA methylation/gene transcription in cord blood35 and whole blood monocytes36, and a Mendelian randomization study24. As pointed out therein24, the inferred direction of causality between DNA methylation and gene transcription can be biased because their measurement is noisy. Nevertheless, our findings are based on interpreting only models with substantial statistical evidence (logarithm of Bayesian factor larger than one). At the same time, that threshold could have biased our findings, as we could not assign any of the investigated causal structures to about 10% of the CpGs. Our results do not exclude the possibility that at some cis CpGs methylation levels do drive gene transcription and they point to the presence of other regulating mechanisms. Overall, this finding calls attention to potential weaknesses of current epidemiologic approaches based on investigating a single epigenetic layer uncoupled with downstream functional molecular phenotypes.
Our study supports regulation of pro-inflammatory genes in leukocytes as a mechanism through which disadvantaged early life socioeconomic conditions heightens inflammatory levels in adult life, but it is still unclear when that happens and via which intermediate exposure. Repeated measurements across the life course are needed to answer that question. Our work could extend in prioritizing repeated measurements in life course longitudinal studies to identify critical/sensitive windows of socioeconomic exposures.
METHODS
Study populations
We used individual-participant data from two multi-centre population-based studies: the Cardiovascular Risk in Young Finns Study (YFS)26 and the Swiss Kidney Project on Genes in Hypertension (SKIPOGH)27.
YFS participants were selected from five Finnish areas according to the location of the university cities with a medical school (Helsinki, Kuopio, Oulu, Tampere and Turku). In each area, urban and rural boys and girls were randomly selected on the basis of their personal social security number from the Social Insurance Institution’s population register, which covers the whole Finnish population. The baseline survey was conducted in six age cohorts of children (aged 3, 6 and 9) and adolescents (aged 12, 15 and 18) in 1980, with a participation rate of 83%, i.e. 3,596 of those invited. Participants were re-examined every 3 years between 1980 and 1992, in 2001 and in 2011, when they had reached mid adulthood (aged 34–49). The YFS study was conducted according to the guidelines of the Declaration of Helsinki, and the protocol was approved by local ethics committees. Own/parental consent was obtained for all participants.
SKIPOGH participants were recruited in the cantons of Bern and Geneva and the city of Lausanne, Switzerland. Participants were randomly selected from the population-based CoLaus study in Lausanne28, and from the population-based Bus Santé study in Geneva29. In Bern, participants were randomly selected using the cantonal phone directory. Inclusion criteria were: (1) written informed consent; (2) minimum age of 18 years; (3) white ethnicity; (4) at least one, and preferably three, first-degree family members also willing to participate. Participation rate was 20% in Lausanne, 22% in Geneva, and 21% in Bern. The baseline assessment initiated in December 2009 and ended in April 2013, including 1,128 participants aged between 18 and 90 years. A follow-up survey (1,033 individuals) started in October 2012 and was completed in December 2016. The SKIPOGH study was approved by the ethical committees of Lausanne University Hospital, Geneva University Hospitals and Bern University Hospital.
Causal models and measures
The causal structures posited in our study are represented in the directed acyclic graphs (DAGs) displayed in Figure 2. The DAG in figure 2A evaluates the portion of the effect of early life socioeconomic conditions (exposure) on adult inflammatory levels (outcome) transmitted by intracellular processes regulating transcription of genes in immune cells (mediators). This “indirect” effect captures all potential pre- and post-transcriptional regulating mechanisms including DNA methylation, histone modifications, transcription factors binding and microRNAs30,31. These processes mutually regulate in both physiological and diseases conditions. In the second part of our study, we focused on local DNA methylation as regulating mechanism (mediator) of a gene’ transcription (outcome) according to the causal structure in Figure 2B. We chose DNA methylation as it is the most widely studied epigenetic mechanism in the field of social epidemiology32 and as it represents a potential therapeutic target33. Since the relationship between levels of local DNA methylation and genes transcription can be bidirectional34,35,36 or absent19, for each pro-inflammatory gene and DNA methylation site we did not posit any specific causal structure between gene transcrition and DNA methylation measurements. Instead, we investigated the most likely one among all possible causal structures (see Figure S5 and section entitled Bayesian network model selection).
Parental occupational position defined as the highest household occupation was used as an indicator of socioeconomic conditions in early life (main analysis). In YFS, the occupational position (manual, lower-grade non-manual and higher-grade non-manual, farmer) of the head of the household was assessed in 1980 (study’s baseline), while SKIPOGH participants reported at the study’s follow-up the profession of their father when they were children. Paternal occupational position is a commonly used indicator of socioeconomic conditions in early life37 and is closely related to parental occupational position in the Swiss context until the early 1980s (see SI Appendix). In a sensitivity analysis, we adopted a measure of parental education as an alternative proxy of socioeconomic conditions in early life (see SI Appendix).
Occupational position was categorized according to the European SocioEconomic Classification (ESEC) scheme into high (higher professionals and managers, higher clerical, services, and sales workers (European socioeconomic class 1-3)), intermediate (small employers and self-employed, farmers, lower supervisors and technicians, class 4-6), or low (lower clerical, services, and sales workers, skilled workers, semi-skilled and unskilled workers, class 7-9)38,39.
Systemic inflammatory status was assessed through the amount of C-reactive protein (CRP [mg/L]) in serum (YFS) and plasma (SKIPOGH). CRP is a sensitive marker of inflammation and elevated amounts of CRP have been associated with increased risk for several diseases3. In YFS, CRP levels were measured from blood collected in 2011 through an automated analyzer (Olympus AU400; Tokyo, Japan) and a highly sensitive turbidimetric immunoassay kit (‘CRP-UL’-assay; Wako Chemicals, Neuss, Germany). Detection limit of the assay was 0.06 mg/L.
In SKIPOGH, CRP levels were measured from blood collected between 2012 and 2016 through standard immunoturbidimetry with different detection limits (1 mg/L or 2 mg/L or 3 mg/L) depending on the centre or diagnostic machine used (Cobas 8000 Roche Diagnostics, Modular Roche Diagnostics, Beckmann). Inter-assay coefficients of variation were 7.5% at 10 mg/L for N=317 (Beckmann), 3.3% at 1.5 mg/L for N=372 (Modular Roche), and 4.0% at 0.9 mg/L for N=344 (Cobas 8000 Roche). Since ∼60% of SKIPOGH participants had CRP values below the detection limits, we imputed left censored values via censored maximum likelihood multiple imputation40. We ran 30 imputed data sets with a model of CRP including paternal occupational position (or parental education), body mass index, age, sex, and centre as predictors. Parameters (effects described in section Mediation approach) were estimated for each data set of imputed CRPs and pooled following Rubin’s rule.
Gene transcription and DNA methylation in adulthood were considered as potential mediators and were measured from a mixture of leukocytes cells collected in 2011 for YFS and between 2012 and 2016 for SKIPOGH. Transcriptomic data were available for 1,650 YFS and 723 SKIPOGH participants via Illumina chip and sequencing technologies, respectively (see SI Appendix). DNA methylation was measured in 1,548 YFS and 701 SKIPOGH participants via Illumina Infinium chips (see SI Appendix).
Age, sex, centre of blood collection were included as potential confounders.
In main analyses, we excluded individuals with unknown parental occupational position (N=27 [1.6%] in YFS and N=15 [2%] in SKIPOGH), father not working for health reasons (N=1 in SKIPOGH) and missing CRP (N=1 in SKIPOGH) for a total of 1,623 YFS and 706 SKIPOGH participants (mediation analysis). Parental occupational position (or parental education) was modelled as ordered variable, hypothesizing a dose-response relationship41. CRP was modelled as a continuous outcome and transformed via the natural logarithm to symmetrize its distribution. Finally, parental occupational position, DNA methylation and transcriptomic data were available in 1,367 YFS and 661 SKIPOGH participants (Bayesian network scoring analysis).
Pro-inflammatory genes selection
We selected pro-inflammatory genes based on two strategies independently of the study populations. First, we selected genes based on a transcriptome-wide causal inference analysis via 2-sample Mendelian Randomization (2sMR) aiming at establishing leukocytes transcripts (exposure) driving levels of CRP in blood (outcome)21. 2sMR is potentially robust with respect to unmeasured confounders and exploits publicly available summary statistics of large-scale GWAS42. In our study, we used summary statistics of the associations between about 10M genetic instruments (Single Nucleotide Polymorphisms, SNPs) and gene transcription in leukocytes or CRP levels in blood obtained via GWAS conducted by the eQTLGen consortium23 (N=31,684) and the CHARGE consortium3 (N=148,164), respectively (see SI Appendix).
Critical to 2sMR are assumptions about validity of genetic variants as instrumental variables43. To satisfy the assumption that chosen variants are predictive of the exposure, we selected only eQTLs (P value<1.9*10−5 corresponding to a false discovery rate <0.05)23 and – to reduce bias due to reverse causation – discarded those with a strong association to CRP (P value<1.9*10−5). Remaining assumptions were addressed by using cis eQTLs (likely direct effect of SNP on nearby gene transcription) and applying three complementary 2sMR methods43 (see SI Appendix). All 2sMR methods were run with a minimum of 10 instruments. We estimated F-statistics to assess instruments strengths (F-stat>10 is suggestive of strong instruments). We computed the q value44 from the P values estimated via the three 2sMR methods and declared a gene transcription to be driving CRP levels when q≤0.05 in all three 2sMR methods.
Alternatively, we defined our set of pro-inflammatory genes as those having been identified by the conserved transcriptional response to adversity (CTRA) model12. The CTRA is composed by 53 genes characterized by up/down-regulation depending on whether their immune function is inflammation- or antiviral/antibody-related, respectively14. This gene set has been associated to both early life and adult socioeconomic conditions12,14.
Functional enrichment
We run gene ontology (GO) and pathways enrichment analysis of the list of genes selected via 2sMR with g:Profiler45. The list of genes tested via 2sMR was submitted as the background gene list and multiple testing correction was performed with the g:SCS algorithm that takes into account the structure of functionally annotated gene sets45. GOs or pathways were declared as significantly enriched when g:SCS adjusted P values were <0.05.
Mediation approach
We adopted a counterfactual mediation framework based on natural effect models46 to estimate the portion of the effect of early life socioeconomic conditions on adulthood inflammation transmitted simultaneously or en-bloc by transcription levels of multiple genes in leukocytes. Importantly, the identification of these joint indirect effects is not dependent on the true causal order of the mediators46,47. This property is useful in our case since the interrelations between the transcription of different genes cannot be determined unequivocally. The magnitude of total and joint indirect effects corresponded to exponentiated parameters of natural effects models and is interpretable as percentage of change in the geometric mean of CRP. The proportion mediated (PM) by a bloc of genes was derived by the ratio of the joint indirect and total effect parameters. See SI Appendix for detailed information on interpretation and estimation of natural effect models’ parameters.
We applied principal component analysis (PCA)48 to the selected pro-inflammatory gene sets for estimating a representation of genes’ transcription. In our mediation model, using principal components of genes’ transcription instead of measured transcription levels has the advantage of increasing the reliability of joint indirect effect estimates since it allows to both reduce the number of mediators and the impact of measurement errors in the mediators49. Indeed, PCA is an established dimensionality reduction method allowing the data to be described using a small number of uncorrelated variables (the principal components, PCs) while retaining as much information (variance in the original data) as possible. Furthermore, given the ordering of PCs according to decreasing values of explained variance, PCA is also a noise reduction method when retaining only the principal components with highest explained variance48.
In main analyses, we selected the number of principal components (PCs) in order to explain about 50% of transcription variance. Confidence intervals (95%) of total and joint indirect effects were estimated through percentiles from 5,000 bootstrap draws (with replacement).
Local DNA methylation
For each of the pro-inflammatory genes selected either via 2sMR or the CTRA model, we considered DNA methylation sites (CpGs) within 1,500 bp of the annotated gene location. So defined cis CpGs should ensure that their potential effect on transcription is direct and not through other genes. Furthermore, we removed cross-reactive CpGs and CpGs with known SNPs in probes having MAF>0.01 to avoid potentially spurious or genetically driven signals.
Bayesian network model selection
We applied Bayesian network scoring22 to select the most likely causal pathway between local DNA methylation and gene transcription that are affected by early life socioeconomic conditions. For each CpG we evaluated the DAG structures represented in Figure S1. For each DAG, a posterior probability was estimated through the Bayesian Gaussian score and uniform priors across the eleven DAGs (bnlearn R package50). Finally, to alleviate the potential brittleness of inference from a large number of causal structures, we assessed posterior probabilities of 5 groups of DAGs, by specifying a partition of the DAGs51. Namely, group A encapsulates models where paternal occupational position does not influence DNA methylation nor transcription levels; group B encapsulates models where paternal occupational position causes changes in DNA methylation and transcription levels, and DNA methylation also causes changes in transcription; group C encapsulates models where paternal occupational position drives transcription levels, but methylation and transcription levels are not related; group D encapsulates models where paternal occupational position drives transcription levels that in turn influences methylation levels; group E encapsulates models where paternal occupational position influences DNA methylation but not transcription levels (see Figure S1). To select the group of DAGs with substantial evidence, we computed a Bayes factor (BF) from the two highest posterior probabilities and choose the group with the natural logarithm of BF larger than 1. In other words, a group of DAGs is selected only if its posterior probability is at least roughly three times larger than any other group’s posterior probability52.
Sensitivity analyses
We investigated the stability of analyses upon removing participants with CRP>10mg/L, which may reflect short term immune responses. We computed indirect effects when summarizing transcription levels with 30% and 60% of explained variance by principal components to assess whether the effect sizes reported in main analyses underestimate the amount of mediation. In SKIPOGH, we run mediation analysis with a binary outcome (CRP≥3 mg/L) as the so defined heightened inflammation was independent of the imputation procedure.
We ran mediation and Bayesian scoring analyses when parental education was used as an indicator of early life socioeconomic conditions. Finally, we repeated the model selection via Bayesian scoring when adding a group of negative control DAGs, namely models where uncoupled DNA methylation and gene transcription drive (both or either one) paternal occupational position.
Data Availability
Summary statistics for eQTLs were downloaded from https://eqtlgen.org/. Summary statistics for CRP are available upon request to the CHARGE consortium. SKIPOGH and YFS individual data are available from the PIs of the studies upon reasonable request.
DATA AVAILABILITY
Summary statistics for eQTLs were downloaded from https://eqtlgen.org/. Summary statistics for CRP are available upon request to the CHARGE consortium.
YFS and SKIPOGH individual data are available from the PIs of the study upon reasonable request.
Author contributions
S.S. designed the study; M.B., N.D., B.P., M.P., M.Kä., T.L. and O.R. collected the data; C.C., Z.K., P.M., E.P., C.D., O.D., M.K.-I., G.E., M.C.-H., P.V., M.Ki., E.D. and N.V. contributed reagents, materials or analysis tools; C.C. analysed the data; C.C. and S.S. discussed the results and wrote the paper. All authors reviewed and commented on the manuscript.
Competing interests
The authors declare no competing interests.
Acknowledgements
This work was supported by the European Commission [grant Horizon 2020 number 633666] and the Swiss State Secretariat for Education, Research and Innovation SERI, and by an Ambizione Grant from the Swiss National Science Foundation (PZ00P3_147998).
The YFS study has been financially supported by the Academy of Finland: grants 322098, 286284, 134309 (Eye), 126925, 121584, 124282, 129378 (Salve), 117787 (Gendi), and 41071 (Skidi); the Social Insurance Institution of Finland; Competitive State Research Financing of the Expert Responsibility area of Kuopio, Tampere and Turku University Hospitals (grant X51001); Juho Vainio Foundation; Paavo Nurmi Foundation; Finnish Foundation for Cardiovascular Research ; Finnish Cultural Foundation; The Sigrid Juselius Foundation; Tampere Tuberculosis Foundation; Emil Aaltonen Foundation; Yrjö Jahnsson Foundation; Signe and Ane Gyllenberg Foundation; Diabetes Research Foundation of Finnish Diabetes Association; EU Horizon 2020 (grant 755320 for TAXINOMISIS); European Research Council (grant 742927 for MULTIEPIGEN project); and Tampere University Hospital Supporting Foundation.
The SKIPOGH study was supported by a grant from the Swiss National Science Foundation [grant number 33CM30-124087].
M Kivimäki is supported by the UK Medical Research Council (R024227, S011676), US National Institute on Aging (NIH, R01AG056477), NordForsk and the Academy of Finland (311492). Z Kutalik was supported by the Swiss National Science Foundation (31003A_169929).
The funding sources had no involvement in the study design, data collection, analysis and interpretation, writing of the report, or decision to submit the article for publication.
We are grateful to the CHARGE consortium for sharing their association summary statistics.