Introduction

Chronic kidney disease (CKD) is chiefly defined by a sustained reduction in glomerular filtration rate and/or increased urinary albumin excretion for > 3 months. Chronic kidney disease is a complex multifactorial disease, which is influenced by variation in many genetic and non-genetic components [1]. Risk of kidney disease has a notable genetic component, and identified genes have provided new insights into relevant abnormalities in renal structure, function and essential homoeostatic processes. Since 2005, unbiased genome-wide mapping approaches such as genome-wide association studies have emerged as methods to search for the causes of complex disease. Genome-wide association (GWA) studies in nephrology have focused on two areas: the study about CKD-defining traits such as serum creatinine or albuminuria and the study of specific CKD etiology such as IgA nephropathy or membranous nephropathy [2]. As for CKD-defining traits, Köttgen revealed several genetic loci associated with cross-sectional estimated glomerular filtration rate (GFR), such as UMOD, SHROOM3, GATM, STC1 [3]. Recently, additional genetic variants associated with cross-sectionally estimated GFR were identified using GWA study meta-analysis in up to 175,579 individuals [4]. Several identified genetic loci revealed an association with incident CKD or end-stage renal disease (ESRD) [5, 6]. Some of them, detected in association with estimated GFR, were characterized in experimental models and contributed to the understanding of the development of kidney diseases [7].

Chronic kidney disease has been recognized as a major public health burden in the last decade. The population prevalence of CKD exceeds 10% worldwide [8] and is more than 50% in high-risk subpopulations [9]. As for Korea, the prevalence of CKD was over 13% in 2009 [10]. Moreover, Korea is one of the countries with greatest proportionate increases in the incidence of ESRD according to United States Renal Data System Annual Report. Asian patients seemed to show higher tendency to progression to ESRD compared with other racial groups [11]. Until now, most GWA studies have been performed in the CKD population with European ancestry and there might still be a possibility of undiscovered genetic markers having an association with the CKD development in Asian ethnicity [12].

We postulate that the genetic influence on the development of CKD in East Asians might be different from European population. Hence, we tried to find the genetic loci associated with incident CKD and to figure out the effect of genetic variation on the development of CKD in two population-based cohorts in Korea.

Materials and methods

Study population and design

This longitudinal study on two population-based cohorts from Ansan (urban) and Ansung (rural) areas, Korea, was conducted by the Korean National Institute of Health as part of the Korean Genome and Epidemiology Study (KoGES), a Korean government-funded epidemiological survey to investigate trends in chronic diseases [13]. All the participants volunteered and provided written informed consent prior to their enrollment. A total of 10,038 individual participants were examined biannually using laboratory tests, electrocardiograms, chest X-rays, and health questionnaires, and a 14-year follow-up study was recently completed. All the participants’ records, excluding the survey date and home region, were anonymized and deidentified before being analyzed by the authors. The study protocol was approved by the Institutional Review Board of the Korea Centers for Disease Control and Prevention (KBP-2016-056).

From the Ansan and Ansung cohorts, we included only the participants aged 40–49 years to eliminate the effect of age-dependent decline of eGFR. Therefore, 4711 participants of age between 40 and 49 were screened. Among them, 545 were excluded during the quality control process of genotyping. 549 were excluded for having CKD at initial visit or not having any follow-up creatinine data. Finally, we analyzed 3617 participants (Fig. 1). All eligible participants were subdivided into two groups: participants who developed CKD during follow-up (incident CKD) and who remained free of CKD (control group). Incident CKD cases were defined as those free of CKD at baseline (defined as estimated GFR ≥ 60 ml/min/1.73 m2) but checked with estimated GFR < 60 ml/min/1.73 m2 at any visit during the follow-up period. Controls were those free of CKD during follow-up period. We used Ansan cohort as discovery set and Ansung cohort as replication set for GWA study (Table 1).

Fig. 1
figure 1

Defining the study population. We reviewed the 4711 participants aged 40–49 and analyzed 3617 participants, divided into incident CKD and control groups

Table 1 Characteristics of the study population

Serum creatinine was measured using Jaffe method. Estimated GFR was calculated using the following Modification of Diet in Renal Disease Study (MDRD) equation because creatinine was not calibrated to an isotope dilution mass spectrometry reference methods [14]: estimated GFR = 186.3 × (Scr in mg/dL)−1.154 × age− 0.203 × (0.742 if female).

Hypertension was defined as a systolic blood pressure ≥ 140 mmHg or a diastolic blood pressure ≥ 90 mmHg, and/or the use of antihypertensive medication. Diabetes mellitus was defined as a fasting plasma glucose level of 126 ml/dL or above or the use of hypoglycemic medication.

Genotyping and quality control

Single nucleotide polymorphism (SNP) genotypes we used for the GWA study are obtained from the public data from KoGES, based on the Korea Association Resource project and previously completed before [15]. Genotyping was performed with the Affymetrix Genome-Wide Human SNP array 5.0 and imputed with IMPUTE program for Ansan and Ansung cohorts’ genomic DNA. Genotyped data include SNPs from 4711 individuals, being passed through some quality control procedures. According to the process of quality control, several participants were excluded on account of contamination, gender inconsistency, cryptic relatedness, serious concomitant illness. Finally, 351,228 SNPs can be used in the analysis, after excluding SNPs with a high missing call rate (> 5%), low minor allele frequency (< 1%) or low Hardy–Weinberg equilibrium P value (P < 1 × 10−6).

Statistical analysis

Descriptive statistics on the baseline characteristics of study population were calculated with incident CKD group and control group. Results are expressed as mean ± standard deviation or as numbers and percentages. Student’s t test was used to evaluate the differences in means between the two groups, or one-way analysis of variance was used for more than two groups. Categorical variables were assessed using Chi-square analysis with Fisher’s exact test when the number of data points was small. Cox proportional hazard regression analysis was performed to estimate the independent association with incident CKD. We used Harrell’s C to compare the discrimination of the survival models [16].

To explore genetic loci associated with incident CKD, we performed logistic regression analysis for case–control study, adjusted for age, gender, hypertension, diabetes mellitus and initial eGFR, using PLINK software (version 1.07). Significant SNPs were selected with a p value less than 5 × 10−3, which was applied to both Ansan and Ansung in additive model. Significant SNPs that are common for both Ansan and Ansung cohorts were regarded as SNPs associated with incident CKD. After choosing a representative SNP in same linkage disequilibrium blocks which have analogous allele pattern, genetic risk score (GRS) was calculated by multiplying the corresponding regression coefficient derived. GRS was put into Cox regression analysis as a continuous variable to assess for the prognostic factor of incident CKD.

Statistical analyses were conducted using STATA 13 (StataCorp LP, College Station, TX). R software (version 3.3.2) and Locuszoom (version 0.4.8) were used for regional association plot.

Results

Baseline characteristics

The characteristics of each cohort and study design are shown in Table 1. Out of the 3617 individuals included in the study, 281 were diagnosed as CKD during follow-up period. We divided them into either control or incident CKD groups, depending on whether they developed CKD during follow-up.

The clinical characteristics and laboratory data of each group are shown in Table 2. Participants in incident CKD group were older, had smaller proportion of male and had higher body mass index (BMI) at baseline. The proportions of current smoker or current drinker were smaller and education level was lower in the incident CKD group. There were no differences in the level of income or average daily dietary intake between two groups. Incident CKD group had higher proportion of hypertension or diabetes mellitus and higher serum creatinine at baseline (Table 2).

Table 2 Baseline characteristics grouped according to incident CKD or control

For the differences in gender proportion between two groups, we performed further analysis depending on gender (Additional file 1). There was no difference in smoking or drinking history, and education level between two groups in each gender.

GWA study on incident CKD

First, we analyzed genetic variation data from 192 participants in incident CKD (case) and 2318 who did not develop CKD (control) from Ansan cohort. We carried out logistic regression analysis for incident CKD by including age, gender, diabetics, hypertension and initial eGFR as covariates, and calculated the minimal p value for additive genetic model. After a standard quality control procedure, we obtained genotyping results on 1590,162 SNPs and generated a quantile–quantile plot (Additional file 2). Genomic inflation factor was 0.9954 in the plot, indicating normality. The association analysis revealed that a total of 7765 SNPs were significantly associated with incident CKD in the Ansan cohort (Pdiscovery ranging from 1.54 × 10−6 to 5.00 × 10−3).

We performed subsequent association analysis of genetic data from Ansung cohort and found 12 SNPs (rs2025936, rs11166378, rs10783124 on chromosome 1, rs1505141, rs2777732, rs1146883, rs1146888, rs2657128, rs1700826, rs2657132, rs1146890 on chromosome 13, rs236586 on chromosome 17) were common in Ansan and Ansung (Table 3). Most of them are intergenic SNPs. There were several SNPs in one linkage disequilibrium (LD) block simultaneously, indicating that their statistical values are similar to each other. SNPs on chromosome 1 are close to gene Amylo-Alpha-1, 6-Glucosidase, 4-Alpha-Glucanotransferase (AGL) and Solute Carrier Family 35 Member A3 (SLC35A3). SNPs on chromosome 13 are close to gene LMO7 Downstream Neighbor (LMO7DN) and Potassium Channel Tetramerization Domain Containing 12 (KCTD12). SNP on chromosome 17 is Potassium Voltage-Gated Channel Subfamily J Member 2 (KCNJ2) and Cancer Susceptibility Candidate 17 (CASC17). We showed three regional association plots to identify genetic location of the replicated SNPs (Fig. 2). Three SNPs on the chromosome 1 are between AGL and SLC35A3, eight SNPs on the chromosome 13 are around LMO7 and KCTD12, one SNP on the chromosome 17 is near KCNJ2.

Table 3 Single nucleotide polymorphisms (SNPs) that showed the association with incident chronic kidney disease
Fig. 2
figure 2

A regional association plot. Three regional association plots to identify genetic location of the replicated SNPs from chromosome 3, 13, 17, which were associated with incident CKD

Creating genetic risk score

We selected three SNPs (rs11166378, rs1146890, rs236586) to specify representatives in several LD blocks which have analogous allele pattern. These three SNPs were located in different LD blocks and their p values were highest to perform the analysis conservatively. LD-based pruning was performed; one SNP with the lowest P value was selected from each LD block [17]. Weighted genetic risk score (GRS) was calculated by multiplying the number of risk alleles at each locus (0, 1, 2) for the corresponding effect size as previous reported [18]. Mean value of GRS of incident CKD was higher than control group (1.32 ± 1.41 vs 0.80 ± 1.18, P < 0.001).

Risk factors for incident CKD

The study population was followed up for 8.33 ± 2.55 years (control vs incident CKD group 8.47 ± 2.43 vs 6.73 ± 3.26). 7.77% (281 out of 3617) developed CKD during the 10-year follow-up period; incidence rate of CKD was 931 per 100,000 person-years.

Stepwise multivariate models were applied to examine the association of GRS and incident CKD (Table 4). GRS was a significant factor for incident CKD (HR 1.311, 95% CI 1.201–1.431, P < 0.001) in multivariate Cox regression analysis which was adjusted with age, gender, diabetes, hypertension, estimated GFR, and BMI. Analysis based on the model II added with GRS exhibited higher C statistics than the model II without GRS (0.779 vs 0.762, P = 0.011).

Table 4 Multiple Cox regression analysis of incident chronic kidney disease

GWA study on eGFR 30% reduction

We performed post hoc GWA study for 30% reduction of eGFR over 2–4 years as another outcome. We could not find a common SNP or gene variation between the original analysis for incident CKD and this post hoc analysis (Additional file 3).

Discussion

In the present study, we calculated the incidence rate of CKD in general population. Then we found several SNPs associated with incident CKD and suggested a GRS associated with the risk of CKD. Beyond traditional risk factors, our GRS might help to predict the risk of incident CKD after follow-up.

In these 3617 individuals aged in their 40s from Ansan- and Ansung-based cohort, CKD incidence was 931 per 100,000 person-years. We included only the subjects in their 40s since there is still a debate about labeling CKD with abnormal kidney function simply based on estimated GFR in elderly population.

We found 12 novel SNPs associated with incident CKD. All were intergenic and none of them were included in the 53 SNPs, so far reported to be associated with estimated GFR [4] or incident CKD and ESRD [5]. However, most of the genes related to our 12 novel SNPs are known as genes associated with kidney diseases in previous databases. LMO7DN (C13orf45) is included in chronic renal insufficiency dataset in PADB [19]. AGL, SLC35A3, KCTD12, and KCNJ2 are in Harmonizome database [20]. Their function is not fully known and the biological pathway needs to be validated. AGL, its isoform 1 is predominantly expressed in the liver, kidney and lymphoblastoid cells and associated with glycogen metabolism which could cause kidney dysfunction [21]. KCNJ2 gene is one of the voltage-gated potassium channel and is associated with Andersen–Tawil syndrome and Short QT syndrome. It is known to have renal hypoplasia with possible renal failure [22]. CASC17 is known to functionally interact with SOX9 through transcription factors [23] which is required for the acute kidney injury recovery [24].

In this study, GRS raises the predictability of incident CKD by about 30%, even after adjustment for covariates. However, GRS was shown to only slightly improve discrimination of CKD incidence beyond traditional risk factors. Kidney disease certainly has underlying genetic susceptibility. In 5883 patients who initiated renal replacement therapy in US population, approximately 23% of incident dialysis patients have close relatives with ESRD, even after excluding genetic disorders such as polycystic kidney disease [25]. Similarly, patients with diabetic nephropathy (DN) from type 1 diabetes who had siblings with DN had a fivefold increased risk of DN compared with diabetic patients without sibling with DN [26]. Genome-wide linkage analysis with 1224 subjects from Framingham Heart Study offspring cohort also showed that multivariable-adjusted heritability estimates for creatinine, GFR, and creatinine clearance are 0.29, 0.33, and 0.46, respectively [27]. However, in currently performed GWA study, the effect of genetic factors seemed to be small. Genetic loci identified in the GWA study of Köttgen et al. explain only 0.7% of population variability in estimated GFR [3]. Also Ma et al. also reported that GRS based on 53 SNPs associated with estimated GFR could not contribute to raising the discrimination of CKD [28].

We could explain the minimal increase of discrimination power in this study as follows. First, correlation between multiple genes or genotype and environmental factors could not be considered. Also the genomic data we analyzed were already filtered in criterion of minor allele frequency over 0.05 for finding common etiological genetic factors. We could not gain the information about rare variants. Also structural variation such as copy number variation was not analyzed in this study [29, 30]. Data that can analyze structural variations and accurate analytical pipelines are critical and necessary [31]. It might help to learn about the more detailed function and structure of the genome.

This study has some limitations. There was no common SNPs or gene variation between the original analysis (incident CKD as the outcome) and post hoc analysis (30% eGFR reduction as the outcome). We decided to put our emphasis on the incident CKD as the final outcome, because 30% eGFR reduction is used as a surrogate marker for relatively short duration (over 2–4 years). However, our cohort subjects were followed over much longer time period, sufficient to reach the endpoint such as incident CKD. Also, the measurement method of initial creatinine was not standardized to calibrate to an isotope dilution mass spectrometry reference measurement procedure. The Ansan and Ansung cohort studies started long before creatinine measurement became standardized. Therefore, we think it is not appropriate to analyze 30% eGFR reduction as main outcome. Besides, original cohort data did not include quantitation of proteinuria. Our population size was relatively small for a GWA study, given that large sample sizes are required to increase statistical effect size. An angiotensin converting enzyme (ACE) polymorphism, previously known to affect the kidney diseases, was not evident in our GWA study, possibly due to the small population size. Since information on the family history was not available in this KoGES dataset, we could not estimate the heritability. However, our study has the strength in that we performed GWA study in an Asian-based cohort with a large, community-based sample and relatively long duration of follow-up.

Conclusions

In conclusion, we identified several loci highly associated with incident CKD. We developed a GRS with SNPs, which raised the risk prediction of CKD. The GRS was shown to slightly improve the discrimination of CKD incidence beyond traditional risk factors.