Introduction

The Gail model for predicting the absolute risk of invasive breast cancer in white women combined relative risks associated to four traditional risk factors (age at menarche, number of breast biopsies, age at first live birth, and number of first-degree relatives with breast cancer) derived from a case–control study conducted in the Breast Cancer Detection Demonstration Project [1] with baseline age-specific incidence rates of invasive breast cancer from population-based US cancer registries in the Surveillance, Epidemiology, and End Results Program [2]. This prediction model has been validated in several cohorts from the United States, including large general populations [35], regularly screened subpopulations at elevated risk [24], and small studies in high-risk clinics [6, 7]. The Gail model showed heterogeneous but generally acceptable calibration with modest discrimination ability among white US women [8], and it has been widely used to design international prevention trials [9, 10] and to counsel women about their individual risk [11].

Few and relatively small validation studies have been conducted in Western non-US populations, and none of them used a population-based cohort design. In the United Kingdom, the Gail model underestimated significantly breast cancer risk in 3,150 women attending a family history clinic [12]. In Italy, the Gail model showed good overall calibration but modest individual discrimination in 5,383 hysterectomized women enrolled in a breast cancer chemoprevention trial [13] and, more recently, in 10,031 female volunteers with high prevalence of risk factors who participated in the Florence cohort of the European Prospective Investigation into Cancer and Nutrition study [14].

The Gail model could be useful to predict the risk of developing invasive breast cancer in Spain, where all women aged 50–69 years are currently covered by population-based mammographic screening programs [15]. However, age-standardized breast cancer incidence rates in Spain (61 cases per 100,000 women in 2008) are substantially lower than those in the United States (76) and most countries in Northern (84), Western (90), and Southern Europe (69) [16]. Thus, to avoid a systematic overestimation of breast cancer risk among Spanish women, it may be necessary to recalibrate the Gail model for the different incidence rates of invasive breast cancer and prevalences of risk factors in the Spanish population. In this study, we evaluated the predictive accuracy of the Gail model in its original form and after recalibration in a large population-based cohort of women who participated in the Navarre Breast Cancer Screening Program (NBCSP), and compared its performance with that of a similar prediction model fully developed from this Spanish cohort.

Methods

Navarre Breast Cancer Screening Program

The NBCSP belongs to the European Breast Cancer Network and was the first population-based mammographic screening program implemented in Spain in September 1990. The initial target population covered all women aged 45–65 years residing in the northern Spanish province of Navarre, but this age range was extended to 69 years in 1998 (77,455 female inhabitants aged 45–69 years in 2001). The program achieved full coverage of the target population in 2 years, the period established as the screening interval. All performance indicators of the NBCSP during the period 1990–2004, including a participation rate for first invitation of 84 % and an adherence to successive invitations of 97 % [17], have consistently exceeded the reference levels set by European guidelines [18]. The population impact of the NBCSP on breast cancer incidence and mortality rates in Navarre has recently been reported [19, 20].

Study cohort, baseline assessment, and follow-up

A total of 62,909 women with no history of invasive or in situ breast cancer who resided in Navarre and were born between January 1, 1931 and December 31, 1952 were invited to participate in the fourth screening round of the fully consolidated NBCSP. Of these, 54,995 women agreed to participate and were mammographically screened between September 1996 and July 1998 (participation rate 87.4 %).

Baseline information on age at menarche, previous breast biopsy, number of births, age at first live birth, and number of first-degree relatives (mother or sisters) with breast cancer was obtained from structured questionnaires administered by trained interviewers in the fourth screening round. Most women who reported ever having a breast biopsy referred to tests performed outside the NBCSP, and hence we were unable to determine the precise number of previous breast biopsies. Also, atypical hyperplasia was only ascertained in a small subset of women with biopsies performed within the NBCSP and, therefore, not considered in risk predictions.

For the present study, we excluded 168 women with prevalent breast cancer at their baseline mammographic examination in the fourth screening round, as well as 3 women who developed breast cancer and 35 women who died within 180 days from baseline. We also excluded 113 women lost to follow-up after baseline examination and 27 women with missing baseline information on the required risk factors. Thus, the starting cohort consisted of 54,649 women aged 45–68 years who were followed for the period beginning 180 days after the 1996–1998 baseline examination through December 31, 2005. Breast cancer cases were ascertained through linkage with the population-based Navarre Cancer Registry [21], which records all incident cases of invasive or in situ breast cancer diagnosed since 1973 in women residing in Navarre. Case ascertainment during follow-up was likely to be complete, since the registry searched all relevant case sources in addition to the NBCSP, with 99 % of breast cancer cases histologically verified and 0.8 % registered solely on the basis of death certificates in 1998–2002 [22]. Deaths from other causes were identified through the Navarre Mortality Registry, which includes all deaths registered in Spain among residents in Navarre. The municipal register of inhabitants and the regional health system were consulted to confirm that disease-free women were still living in Navarre at the end of follow-up. Only 292 women were lost to follow-up and censored at the time of their last visit to the NBCSP, while the remaining women were followed disease free through December 31, 2005.

During an average follow-up of 7.7 years, 835 cases of invasive breast cancer, 150 cases of ductal carcinoma in situ, and 2 cases of non-epithelial breast tumor were diagnosed. In addition, 1,218 other women died from causes not related to breast cancer. Hormone receptor status could be determined from pathology reports in 767 of the 835 invasive breast cancers (91.9 %), with 653 tumors positive for either estrogen (634) or progesterone receptors (486) and 114 tumors negative for both receptors.

Statistical analysis

The baseline hazards and hazard ratios of invasive breast cancer in the NBCSP cohort were estimated from a piecewise exponential model [23] with constant baseline hazards in each 5-year age interval from 45 to 74 years and the same ordinal risk factors as the original Gail model [1], except for the simpler never/ever classification for previous breast biopsy. In particular, the risk factors included in this model were age at menarche (coded as 0, 1, or 2 for ≥14, 12–13, or <12 years, respectively), previous breast biopsy (coded as 0 if no and 1 if yes), age at first live birth (coded as 0, 1, 2, or 3 for <20, 20–24, 25–29 or nulliparous, or ≥30 years, respectively), and number of first-degree relatives with breast cancer (coded as 0, 1, or 2 for 0, 1, or ≥2 affected relatives, respectively). The model also included interaction terms between age at first birth and number of affected relatives and between breast biopsy and age (coded as 0 if <50 and 1 if ≥50 years), so that the hazard ratio for breast biopsy was allowed to vary from age intervals below to those above 50 years. A detailed justification and specification of this model is provided in the statistical appendix (Supplementary Material 1). The composite hazards of death from other causes were calculated by dividing the observed number of deaths in the NBCSP cohort by the woman-years at risk in each 5-year age interval.

Following standard competing risk methods [1, 23], three alternative models were used to predict the absolute risk of invasive breast cancer for each NBCSP woman according to their own risk factor profile. The Navarre model was based on baseline hazards and hazard ratios of invasive breast cancer estimated from the above piecewise exponential model in the NBCSP cohort, as well as on composite hazards of competing death among NBCSP women. The original Gail model used Gail relative risk estimates [1] and invasive breast cancer and mortality rates for white US women [2], whereas the recalibrated Gail model combined the original relative risk estimates [1] with composite incidence rates of invasive breast cancer, composite mortality rates, and risk factor prevalences among cases from the NBCSP cohort. The Gail relative risk for women with any previous breast biopsy was calculated as a weighted average of the reported relative risks [1] for one and two or more biopsies. Further details on the development of these prediction models are provided in the statistical appendix (Supplementary Material 1).

Calibration and discrimination of the three prediction models among NBCSP women were evaluated through a 10-fold cross-validation to correct for the optimistic bias induced by testing the Navarre model on the same training NBCSP data [24]. Calibration was assessed by comparing the observed cases of invasive breast cancer in the NBCSP cohort by age interval, risk factor category, and quintile of predicted 5-year risk with those expected under the Navarre, original Gail, and recalibrated Gail models [25]. Discrimination was evaluated using overall and age-specific C indexes [26], which are extensions of the area under the receiver-operating curve to censored time-to-event data. The discrimination ability of the Gail model remained unchanged after recalibration. Further details on cross-validated calibration and discrimination statistics are provided in the statistical appendix (Supplementary Material 1).

Results

Cause-specific hazards and hazard ratios from NBCSP cohort

The hazard of invasive breast cancer was higher in NBCSP women with previous benign breast biopsies, and it increased with decreasing age at menarche and with increasing age at first live birth and number of affected first-degree relatives. These hazard ratios were similar in direction but lower in magnitude than those from the Gail model, particularly for the strata of age at first birth by number of affected relatives (Table 1). Contrary to the Gail model, there was no significant interaction between breast biopsy and age (P = 0.97) or between age at first birth and number of affected relatives (P = 0.23). The population attributable risk for all four factors was 0.280 and varied little with age.

Table 1 Hazard ratios of invasive breast cancer by risk factor category in the Navarre Breast Cancer Screening Program cohort, 1996–1998 to 2005

The baseline hazards of invasive breast cancer from the NBCSP cohort increased steadily in screened women aged 45–64 years and declined in unscreened older women. These baseline incidence rates were similar to those derived from the Navarre Cancer Registry, except that the latter also included prevalent cases aged 45–49 years detected at their first participation in the NBCSP (Table 2). The composite mortality rates from other causes in the NBCSP cohort increased sharply with age but were 18.8 % [standardized mortality ratio 0.812, 95 % confidence interval (CI) 0.768–0.859] lower than those registered in the entire female population of Navarre (Table 2), suggesting that self-selected women in the NBCSP cohort were somewhat healthier than the general female population.

Table 2 Age-specific incidence rates of invasive breast cancer and mortality rates from other causes (per 100,000 woman-years) in the Navarre Breast Cancer Screening Program cohort, 1996–1998 to 2005

Calibration of prediction models

The Navarre model showed good cross-validated calibration overall (ratio of expected to observed cases 820.1/835 = 0.98, 95 % CI 0.92–1.05), as well as across categories of age at menarche (goodness-of-fit P = 0.42), breast biopsy by age (P = 0.99), and age at first birth by number of affected relatives (P = 0.95). The original Gail model overestimated significantly the absolute risk of invasive breast cancer in the NBCSP cohort by 46 % (expected-to-observed ratio 1215.5/835 = 1.46, 95 % CI 1.36–1.56), with greater overprediction in the older age intervals (Table 3). This systematic overestimation disappeared after recalibrating the Gail model (expected-to-observed ratio 836.4/835 = 1.00, 95 % CI 0.94–1.07), with no significant lack of fit across the three risk factor categorizations (P = 0.48, 0.36, and 0.15, respectively).

Table 3 Ratios of the expected cases of invasive breast cancer under the Navarre, original Gail, and recalibrated Gail prediction models to the observed cases in the Navarre Breast Cancer Screening Program cohort by age interval and risk factor category, 1996–1998 to 2005

The median predicted 5-year risks of invasive breast cancer were 0.93, 1.31, and 0.95 % under the Navarre, original Gail, and recalibrated Gail models, respectively, with 2.9, 25.6, and 4.1 % of NBCSP women above the standard risk threshold of 1.67 %. The Navarre model showed good agreement between observed and expected cases by quintile of predicted 5-year risk (goodness-of-fit P = 0.36). The original Gail model overpredicted significantly invasive breast cancer cases in all quintiles of risk (Table 4). The recalibrated Gail model corrected this systematic overprediction (goodness-of-fit P = 0.25), but due to the larger Gail relative risks, it still showed a significant positive trend in the expected-to-observed ratios across quintiles of risk (P for linear trend = 0.01).

Table 4 Ratios of expected to observed cases of invasive breast cancer in the Navarre Breast Cancer Screening Program cohort by quintile of predicted 5-year risk based on the Navarre, original Gail, and recalibrated Gail prediction models, 1996–1998 to 2005

Discrimination of prediction models

Overall, the cross-validated discrimination indexes among NBCSP women were modest and equal to 0.542 (95 % CI 0.521–0.564) for the Navarre model and 0.544 (95 % CI 0.523–0.565) for the Gail model, with no significant difference between models (P = 0.67). Discrimination remained similar in age intervals below 70 years and increased marginally to 0.628 for the Navarre model and 0.626 for the Gail model among women aged 70–74 years (P for deviation from overall discrimination = 0.09 and 0.08, respectively; Table 5).

Table 5 Overall and age-specific discrimination of the Navarre and Gail prediction models among women in the Navarre Breast Cancer Screening Program cohort, 1996–1998 to 2005

The cross-validated discrimination indexes were somewhat better for hormone receptor-positive invasive breast cancers (0.545, 95 % CI 0.521–0.569, for the Navarre model and 0.543, 95 % CI 0.519–0.567, for the Gail model) than for hormone receptor-negative cancers (0.508, 95 % CI 0.446–0.571, and 0.530, 95 % CI 0.469–0.591, respectively).

Discussion

The original Gail model overestimated the actual invasive breast cancer incidence by 46 % in a large population-based cohort of biennially screened Spanish women aged 45–68 years who were followed for an average of 7.7 years. The recalibrated Gail model was well calibrated overall, but it still underestimated breast cancer risk for women with a low risk-factor profile and overestimated risk for women with a high risk-factor profile. The Navarre model showed good cross-validated calibration overall and in different cohort subsets. Nevertheless, both the Navarre and Gail models had limited discrimination ability of 0.54 in this cohort.

Comparison with other studies

Model calibration is strongly affected by temporal and geographical variations in disease incidence. The Gail model used invasive breast cancer rates among white US women for the period 1983–1987 [2]. Since breast cancer incidence increased steadily during the 1990 s in the United States [27], subsequent validation studies of the Gail model resulted in overall underestimations of invasive breast cancer risk by 6 % in the Nurses’ Health Study [3], by 21 % in the Women’s Health Initiative [4], and by 13–14 % in two other recent US cohorts [5]. Thus, claims have been raised about the need to update invasive breast cancer rates used in the Gail model to ensure a good overall calibration in recent US cohorts [3, 5, 28]. Our results further highlight that, due to large worldwide variations in breast cancer incidence [16], the Gail model should also be recalibrated when applied to the international setting [29]. The lower breast cancer incidence rates in Spain compared with the United States caused the Gail model to overestimate breast cancer risk by 46 % in this Spanish cohort. This systematic overprediction was corrected after recalibrating the Gail model to the lower incidence rates and risk factor prevalences in the study cohort.

The lower incidence of breast cancer in Spain can hardly be explained by differences in regular mammography use since its prevalence is similar in Spain (59 % of women aged 45 years or older in 2006) [30] and the United States (67 % of women aged 40 years or older in 2005) [31]. The distribution of Gail risk factors could better account for part of the observed differences in countrywide rates, as women younger than 12 years at menarche, with biopsy examinations, and with affected first-degree relatives were half as prevalent in the 1996–1998 baseline assessment of this Spanish cohort as in concurrent assessments of large representative US cohorts [4, 5]. Nevertheless, the baseline incidence rates of invasive breast cancer for NBCSP women aged 45–74 years were still 16 % lower than those used in the Gail model [2], suggesting that other factors may contribute to these differences. Obesity is more prevalent among adult white US women [32] than their counterparts in Spain [33]. Moreover, more than one-third of postmenopausal women in the United States were taking hormone replacement therapy between 1995 and 2001 [34], whereas this therapy was rarely used in Spain [35].

The relative risks estimated from this Spanish cohort were lower than those reported in the Gail model [1] which, combined with the smaller risk factor prevalences, resulted in an attributable risk of 0.28, substantially lower than the value of 0.42 found in white US women [2]. The lower relative risks for age at menarche, age at first birth, and number of affected relatives may be explained by the later age at diagnosis of breast cancer cases: only 6 % of cases in our cohort were diagnosed before 50 years of age, as opposed to the 29 % enrolled in the Gail analysis [1]. There is compelling evidence that reproductive [36] and familial factors [37] have stronger effects on the risk of early-onset than late-onset breast tumors. In fact, these risk factors showed consistently weaker associations in three large US cohorts of postmenopausal women [4, 5] than in the Gail model.

Clinical and public health implications and future research

The less pronounced relative risks observed in this Spanish population resulted in a modest discrimination of 0.54 for both the Navarre and Gail models, somewhat lower than the values of 0.58–0.59 reported for the Gail model among white US [35] and Italian women [14]. Well-calibrated prediction models with limited discrimination ability, such as the Navarre and recalibrated Gail models, may be useful in clinical practice for counseling individual patients on the risks and benefits of a preventive treatment [38], as well as for designing adequately powered intervention trials. However, higher discrimination is required for implementing an effective prevention strategy in high-risk subsets of the general population, in order to achieve large reductions in disease incidence [39]. The inclusion of 7–18 common genetic variants for breast cancer has been shown to increase discrimination of the Gail model by 0.03–0.07 [4042]. Apart from the substantial costs of obtaining genetic information, this modest improvement in discrimination was similar to the increase of 0.05 obtained from adding only mammographic density, a strong and highly prevalent risk factor [43]. A nested case–control study is currently being conducted within the NBCSP to obtain mammographic density measurements in nearly 1,000 incident cases of invasive breast cancer and 4,000 disease-free women. This case–control study might provide valuable data to improve the discrimination accuracy of the Navarre model among Spanish women by including mammographic density and enhanced family history information on breast cancer.

Strengths and limitations of the study

The strengths of this study include the use of a large representative cohort of regularly screened Spanish women, the high participation rate, and the relatively long follow-up period with negligible losses to follow-up, virtually complete case ascertainment, and information on tumor receptor status.

The study has several limitations. First, nearly all breast cancer cases were diagnosed in women aged 50 years or older, so our findings may not apply to younger premenopausal women in regular screening. Second, information on atypical hyperplasia was not available in 4,462 of the 4,983 women with previous biopsy because they referred to tests performed outside the NBCSP. Of the remaining 521 women with biopsies performed within the program, 16 had atypical hyperplasia. Thus, we can infer that roughly 0.3 % of the entire NBCSP cohort had atypical hyperplasia (3.1 % with atypia out of 9.1 % with biopsy) and that the overall performance of the Navarre and Gail models was little affected by knowledge of atypical hyperplasia status. However, atypical hyperplasia is a strong risk factor for breast cancer [44] and these models will substantially underestimate breast cancer risk in women with atypia, as has already been reported in other cohorts [7]. Third, nondifferential misclassification of baseline exposure [45] might have partially accounted for the low relative risks and discrimination ability of the Gail model in this Spanish cohort. Nevertheless, data were collected from structured personal interviews and self-reported Gail model variables, including family history of breast cancer in first-degree relatives [46], are typically accurate in this setting. Finally, cross-validation was used to obtain overfitting-corrected estimates of the expected internal validity of the Navarre model in new subjects from the same population, but a more stringent external validation would be required in related but different populations.

Conclusions

The Gail model cannot be applied directly to populations with different underlying rates of invasive breast cancer, but it can readily be recalibrated to provide unbiased estimates of absolute risk in these populations. In our study, the original Gail model showed a substantial overestimation of breast cancer risk that was corrected after recalibrating the model to the lower breast cancer incidence rates and risk factor prevalences in this Spanish cohort. Nevertheless, the limited discrimination ability of the Navarre and Gail models among Spanish women precludes their use for screening applications and highlights the need to develop extended models with additional strong risk factors, such as mammographic density and detailed family history.