The association between plasma chemokines and breast cancer risk and prognosis: A mendelian randomization study

Background: Despite the potential role of several chemokines in the migration of cytotoxic immune cells to prohibit breast cancer cell proliferation, a comprehensive view of chemokines and the risk and prognosis of breast cancer is scarce, and little is known about their causal associations. Methods: With a two-sample Mendelian randomization (MR) approach, genetic instruments associated with 30 plasma chemokines were created. Their genetic associations with breast cancer and its survival by molecular subtypes were extracted from the recent genome-wide association study of 133,384 breast cancer cases and 113,789 controls, with available survival information for 96,661 patients. We further tested the associations between the polygenic risk score (PRS) for chemokines and breast cancer in the UK Biobank cohort using logistic regression models, while the association with breast cancer survival was tested using Cox regression models. In addition, the association between chemokine expression in tumors and breast cancer survival was also analyzed in the TCGA cohort using Cox regression models. Results: Plasma CCL5 was causally associated with breast cancer in the MR analysis, which was significant in the luminal and HER-2 enriched subtypes and further confirmed using PRS analysis (OR = 0.94, 95% CI = 0.89–1.00). A potential causal association with breast cancer survival was only found for plasma CCL19, especially for ER-positive patients. Although not replicated in the UK Biobank, we still found an inverse association between CCL19 expression in tumors and breast cancer overall and relapse-free survival in the TCGA cohort (HR = 0.58, 95% CI = 0.35–0.95). Conclusion: We observed an inverse association between genetic predisposition to CCL5 and breast cancer, while CCL19 was associated with breast cancer survival. These associations suggested the potential of these chemokines as tools for breast cancer prevention and treatment.


Introduction
Breast cancer is one of the most common malignant tumors, which seriously affects the physical and mental health of women worldwide and accounts for 24.2% and 15.0% of the total number of new cancers and deaths in women, respectively (Bray et al., 2018). Although breast cancer has caused a considerable global burden, its etiology is not yet completely understood. Currently, the known breast cancer risk factors include age, sex, race, family history, genetic mutations, early menarche/late menopause and hormone replacement therapy, while evidence for the causal effect of inflammation in breast cancer carcinogenesis is still limited (Winters et al., 2017;Deshmukh et al., 2019). To prevent the incidence of breast cancer and find potential therapeutic targets, we need to better understand the role of inflammatory biomarkers in breast cancer.
At the first stage of malignant evolution, chronic inflammation may play an important role during the initiation, development and prognosis of breast cancer (Danforth, 2021), in which chemokines might regulate the migration of immune cells toward the tumor and act as biomarkers for breast cancer. Studies have shown that  were increased in breast cancer compared with normal adjacent breast tissue (Abrahamsson et al., 2017), and genetic polymorphisms near genes encoding CCL-2, -4, -5, -20, -22, CXCL-10, -12 and XCL-1 were found to be associated with the risk and prognosis of breast cancer (Eskandari-Nasab et al., 2014;Li et al., 2014;Jafarzadeh et al., 2015;Jafarzadeh et al., 2016;Stephens et al., 2017;He and Zhang, 2018;Hu et al., 2018;Chou et al., 2020). However, traditional observational studies are susceptible to reverse causation and residual confounding (Smith and Ebrahim, 2003). Contrary to previous studies, Bowen Chen et al. (2020) used several bioinformatics analysis tools to analyze the expression data of CC chemokines in patients with breast cancer and yielded inconsistent results. Therefore, it is still a challenge to fully understand the causal association between chemokines and breast cancer.
Mendelian randomization (MR) uses genetic variation as an instrumental variable (IV) to infer the association between exposure factors and disease outcomes. Because the genotypes are presumed to be randomly allocated in the process of gamete formation, MR analyses are largely free of reverse causality and less sensitive to confounding factors, which overcomes the limitations of conventional observational studies (Smith and Ebrahim, 2003;Davies et al., 2018).
In the present study, we implemented a two-sample MR design to investigate the potential causal effect of plasma chemokines on the risk and prognosis of breast cancer and to analyze their molecular subtypes to provide more credible evidence on the potential role of chemokines in the development of breast cancer. We further used the UK Biobank and TCGA cohorts to reaffirm our findings.

Study population
Mendelian randomization analysis was used to assess the causal association between plasma chemokines and breast cancer. In the first step, the associations were tested using a two-sample approach with summary statistics from genomewide association studies (GWAS), which does not need individual-level data (Pierce and Burgess, 2013). Genetic instrumental variables for plasma chemokines were created by identifying single nucleotide polymorphisms (SNPs) associated with 41 plasma chemokines in a recent GWAS on the human plasma proteome at the GWAS significant level (p < 5 × 10 −8 ) (Sun et al., 2018). From the 5,965 SNPs reported in the original study, 91 SNPs for 30 plasma chemokines were selected after excluding SNPs with linkage disequilibrium (LD threshold: r 2 < 0.01). The SNPs used in this study and their corresponding plasma chemokines and proportion of variance explained are listed in Supplementary Table S1. We used R 2 and F statistics to estimate the proportion of variance explained by SNPs, with the following formulas And F R 2 p(N−1−k) (1−R 2 )*k , in which β is the coefficient for the SNP, MAF is the minor allele frequency, se(β) is the standard error of the coefficient, N is the sample size and k is the number of SNPs used.
GWAS summary statistics for breast cancer overall and by molecular subtypes (luminal A, luminal B, Her-2 enriched and triple negative) were downloaded from the Breast Cancer Association Consortium (Zhang et al., 2020), with 133,384 breast cancer cases and 113,789 controls. The GWAS summary statistics for breast cancer survival were also downloaded from BCAC, including 96,661 patients with 7,697 breast cancer mortality cases; among them, 64,171 were ER-positive patients, and 16,172 were ERnegative (Escala-Garcia et al., 2019). Among the 91 selected SNP, 7 SNPs were removed during the harmonization process of the dataset, because they were palindromic with intermediate allele frequencies (Supplementary Table S1), leaving 84 SNPs in the final analysis. Heterogeneity among genetic variants was quantified by I 2 statistic (Supplementary Table S2). All studies involved in the GWAS on the human plasma proteome and the Breast Cancer Association Consortium were approved by the Frontiers in Genetics frontiersin.org relevant research ethics committee, and all participants provided informed consent. For those identified plasma chemokines in the two sample Mendelian randomization studies for breast cancer, Mendelian randomization was further applied to validate the findings in UK Biobank. Briefly, the UK Biobank is a prospective cohort comprising approximately 502,505 participants (55% women) of various ethnicities, aged 40-69 at recruitment between 2006 and 2010. At enrolment, information on the participants' health-related factors and socio-demographic characteristics and biological samples were collected in 22 assessment centers around United Kingdom.
Blood samples from women in the UK Biobank were genotyped using a custom UK Biobank Axiom Array, comprising 825,927 SNPs. Details of the array design, sample handling, quality control processes and imputation of non-genotyped variants are described elsewhere (Bycroft et al., 2018). Breast cancer cases were obtained through linkage with the national cancer registries in the United Kingdom and updated until 2015. The ICD-10 code C50 and ICD-9 code 174 were used to identify breast cancer cases, while the controls were women without a diagnosis of breast cancer. To assess whether breast cancer is associated with a genetic predisposition to plasma chemokines, we selected the same genome-wide significant SNPs used in the two-sample MR. For all individuals, a weighted polygenic risk score (PRS) was calculated using the following formula: where β is the per-allele log odds ratio (OR) of the breast cancerassociated risk allele for SNP k , x k is the number of alleles for the same SNP (0, 1, 2), and n is the total number of breast cancer SNPs included in the profile. For the analyses, PRS was categorized into tertiles.
To further explore the effect of the identified chemokines associated with breast cancer survival, we explored the association between PRS of the chemokines and breast cancer mortality in the UK Biobank. After that, 1,077 women in the TCGA breast cancer cohort were followed for their overall and breast cancer survival. Gene expression data of the chemokines in tumor tissues were also downloaded and categorized into tertiles.

Statistical analysis
Several methods were used to estimate the potential causal effect of plasma chemokines on breast cancer and its prognosis. The main analysis was based on a fixed-effects meta-analysis using an inverse-variance weighting model (IVW), in which β and se were calculated using the following formulas β IVW : N is the total number of SNPs, βexp.i is the coefficient of the i-th SNP on the exposure, βout.i is the coefficient of the i-th SNP on the outcome, and σout.i is the standard error of βout.i. If only one SNP was identified for the chemokine, the Wald ratio method was used by calculating the ratio of βexp and βout, while se could be approximately estimated using the Delta method (Pagoni et al., 2019). We further performed sensitivity analysis to test horizontal pleiotropy using weighted median and Egger regression methods. The weighted median estimator is consistent with the IVW estimation even when up to 50% of the SNPs are invalid instrumental variables, while MR-Egger regression simply adds an intercept term of pleiotropy into the IVW regression model (Bowden et al., 2016).
To test the key assumption of no horizontal pleiotropy in MR, we further analyzed the effect of plasma chemokines on potential risk factors for breast cancer, e.g., age at menarche, age at menopause, age at smoking initiation, and BMI, using their corresponding GWAS summary statistics (Wood et al., 2016;Liu et al., 2019;Elsworth et al., 2020).
To validate the discovered potential causal association between plasma chemokines and breast cancer, we used logistic regression models to calculate the ORs of breast cancer among UK biobank women by tertiles of PRS for plasma chemokines. These associations were adjusted for age, the first ten principal components, and the assessment center. We additionally adjusted age at diagnosis, year of breast cancer diagnosis in the Cox regression model to validate the association between PRS for chemokines and breast cancer survival in the UK biobank. We also used the Cox model to estimate the hazard ratios of overall survival and relapse-free survival in the TCGA cohort by tertiles of gene expression of the chemokines in tumor tissues and adjusted for age at diagnosis, estrogen, progestogen, and tumor stage. Missingness in the tumor characteristics were categorized with missing indicator.
All statistical tests were two-sided and performed using R 3.6.3 and Stata 15.1.

Results
In the main analysis, the selected SNPs explained 0.9%-36.3% of the variance in plasma chemokines, and the corresponding F-statistic ranged from 29.7 to 1,879.3 (Supplementary Table S1), suggesting no evidence of weak instrument bias. A higher plasma CCL5 was inversely associated with breast cancer (OR = 0.89, 95% CI: 0.85 to 0.94, p < 0.001), which was primarily attributed to the luminal A and luminal B subtypes (OR for luminal A = 0.88, 95% CI: 0.82 to 0.94; OR for luminal B = 0.80, 95% CI: 0.69-0.94). Plasma growth-regulated alpha protein and Ck-beta-8 were also associated with breast cancer, although the p-value did not survive the Bonferroni correction for multiple testing (Figure 1; Supplementary Table S3).
In the subtype-specific analysis, plasma CCL19 was associated with luminal A breast cancer, while CXCL16 was associated with HER-2-enriched breast cancer. In addition, plasma CCL16 was positively associated with luminal A breast cancer but inversely associated with triple-negative breast cancer. Sensitivity analysis using MR Egger and weighted median methods yielded similar results (Supplementary Table S3).

Frontiers in Genetics frontiersin.org
To validate the findings in the two-sample MR analysis, 264,187 women who had available genotyped data from the UK Biobank were involved in the analysis, with 14,760 breast cancer patients. In multivariable logistic regression analysis, a higher polygenic risk scores for plasma CCL5 was inversely associated with breast cancer (OR for the highest tertile = 0.94, 95% CI = 0.89-1.00, p = 0.045). However, no statistically significant associations for plasma growth-regulated alpha protein, Ck-beta-8 and breast cancer were found. (Table 1).
In the analysis of breast cancer survival, we found a potential causal effect of plasma CCL19 on breast cancer survival (HR = 0.85, 95% CI = 0.75-0.96, p < 0.01), especially for ER-positive breast cancer patients (HR = 0.82, 95% CI = 0.70-0.97) (Figure 2; Supplementary Table S4). However, when we further investigated the association between PRS of CCL19 and breast cancer mortality in the UK Biobank cohort, no statistically significant association was found between them (Table 2). Despite this, in the TCGA breast cancer patient cohort, CCL19 expression in tumor tissues was associated with increased overall and relapse-free survival (HR for the last tertile vs. first tertile = 0.58, 95% CI = 0.35-0.95). This association was also observed for ER-positive patients (Table 3).
MR estimates for the effects of plasma chemokines on breast cancer risk factors indicated little evidence of horizontal pleiotropy (Supplementary Table S5).

Discussion
In this Mendelian randomization study of 30 plasma chemokines, we found an inverse association between genetically predicted plasma CCL5 and breast cancer in both the two-sample and one-sample analyses. This association was more pronounced in luminal and HER-2-enriched breast cancer patients. An effect on cancer survival was only found for plasma CCL19, especially for ER-positive patients. In addition, we also found an inverse association between CCL19 expression and breast cancer mortality using tumor tissue samples.
As key proteins in immune cell migration, chemokines have been studied for their effect on inflammation-associated cancers. One previous Mendelian randomization analysis on cytokines found an association between genetically predicted CCL2, CCL4, and GROα and breast cancer. As the SNPs associated with plasma CCL2 and CCL4 did not exceed the GWAS significance threshold, we were not able to replicate these findings in our study. However, both our study and a previous study confirmed the association between GROα and breast cancer, reaffirming the role of GROα in breast cancer carcinogenesis and probably through the effect of Th17 cells (Ma et al., 2018).
Our analysis found an inverse association between CCL5 and breast cancer, with a similar effect size in a recent phenome-wide Mendelian randomization (Zheng et al., 2020). We further suggested that the association was driven by luminal and HER-2-enriched subtypes of breast cancer. CCL5 might play an antitumor role as it is negatively related to the number of tumor associated macrophages (Fujimoto et al., 2009), which are the major components of the breast tumor stroma and contribute to tumor growth and development. In addition, many autoimmune diseases are associated with increased levels of CCL5 and inversely associated with breast cancer risk (Wadström et al., 2020;Zeng et al., 2022). All the above evidence supported the role of plasma CCL5 in breast tumor carcinogenesis. Considering its strong correlation with the pathogenesis of luminal and HER-2 enriched breast cancer, CCL5 might be used as a The model was adjusted for age at recruitment, assessment center and top 10 principal genetic components. Statistically significant results with p-value < 0.05 are bolded.
Frontiers in Genetics frontiersin.org potential biomarker for breast cancer screening and risk stratification.
In addition, both genetically predicted plasma CCL19 and CCL19 gene expression in tumors was associated with increased survival of breast cancer in our study. As downregulation of the CCL19/CCR7 axis has been observed in metastatic breast cancer compared to the primary tumor (Szekely et al., 2018) and CCL19 suppresses angiogenesis and proliferation of several other types of cancers (Xu et al., 2018;Zhou et al., 2020), it is biologically plausible that CCL19 inhibits breast cancer metastasis through similar AIM2 and Met/ERK/Elk-1/HIF-1α/ VEGF-A pathways (Chen et al., 2006) and therefore increases patient survival. Since we did not replicate the association between CCL19 and breast cancer in the UK biobank, this finding should be interpreted with caution.
Interestingly, we also found that many of the associations between chemokines and breast cancer or its survival were more pronounced in luminal A-or ER-positive cancers, suggesting that the effect of chemokines might be related to estrogen-related carcinogenesis (Lim and Lin, 2021). However, we still found a potential causal association between CCL16 and triple-negative breast cancer. Further study is therefore needed to confirm this association and test its potential role as a biomarker or treatment target for triple-negative breast cancer.
The strength of our study included the use of the most recent and largest summary statistics for breast cancer with subtype-specific information and a comprehensive dataset for plasma chemokines. Our study had several limitations. As no sex-specific GWAS for chemokines has been reported, we assumed that the sex difference in the chemokine GWAS was small and that the original GWAS had already adjusted for age, sex and principal components. In addition, as both the BCAC and the UK biobank dataset included prevalent breast cancer patients, it is possible that the association between chemokines and breast cancer may suffer from survivorship bias, and may not represent patients with more aggressive

FIGURE 1
The causal association between plasma chemokines and breast cancer risk overall and by molecular subtype. OR: Odds ratio. Statistical significance is indicated by *** (p < 0.001), ** (p < 0.01) and * (p < 0.05). The estimates are based on the fixed effect of the inverse variance weighted method or Wald ratio method for meta-analyzing individual SNP results. ORs show the risk of breast cancer overall and by subtypes per SD increase of ranked-inverse normalized concentration in exposure to plasma chemokines.
Frontiers in Genetics frontiersin.org tumors. However, results by different molecular subtypes still indicated some associations with HER-2 enriched and triple negative cancer. Finally, CCL19 can be secreted by cells other than cancer cells, while in our study only CCL19 in tumor tissues were used for the validation of its association with breast cancer survival. The generalizability of the findings is also affected by the use of European ancestry, and the results should be replicated in other populations.

FIGURE 2
The causal association between plasma chemokines and breast cancer survival. HR: hazard ratio. Statistical significance is indicated by ** (p < 0.01) and * (p < 0.05). The estimates are based on the fixed effect of the inverse variance weighted method or Wald ratio method for meta-analyzing individual SNP results. HRs show the risk of breast cancer overall and by subtypes per SD increase of ranked-inverse normalized concentration in exposure to plasma chemokines.

Conclusion
In conclusion, we found that lower plasma CCL5 was causally associated with breast cancer, while plasma and tumor-derived CCL19 might be associated with increased survival of breast cancer patients. Both findings were more pronounced in ER-positive patients, indicating the potential use of CCL5 and CCL19 as biomarkers or drug targets in future studies for breast cancer prevention and treatment.

Ethics statement
The studies involving human participants were reviewed and approved by the Institutional Review Board of Fujian Medical University (No. 2020KY055). The patients/participants provided their written informed consent to participate in this study.

Author contributions
XY and HY had full access to all data and take responsibility for the integrity of the data and the accuracy of the analysis. Study concept and design: HY acquisition, analysis, and interpretation of data: XY, SZ, and PZ Drafting of the manuscript: XY, MS, and YZ Critical revision of the manuscript for important intellectual content: HY, XY, YL, and FF Statistical analysis: XY and YZ Funding: HY.  Frontiers in Genetics frontiersin.org design, data collection, data analysis, data interpretation, or writing of the report.