The Net Reclassification Index (NRI): A Misleading Measure of Prediction Improvement Even with Independent Test Data Sets

Pepe, Margaret S.; Fan, Jing; Feng, Ziding; Gerds, Thomas; Hilden, Jorgen

doi:10.1007/s12561-014-9118-0

The Net Reclassification Index (NRI): A Misleading Measure of Prediction Improvement Even with Independent Test Data Sets

Published: 23 August 2014

Volume 7, pages 282–295, (2015)
Cite this article

Statistics in Biosciences Aims and scope Submit manuscript

Margaret S. Pepe¹,
Jing Fan¹,
Ziding Feng²,
Thomas Gerds³ &
…
Jorgen Hilden³

3148 Accesses
112 Citations
18 Altmetric
Explore all metrics

Abstract

The Net Reclassification Index (NRI) is a very popular measure for evaluating the improvement in prediction performance gained by adding a marker to a set of baseline predictors. However, the statistical properties of this novel measure have not been explored in depth. We demonstrate the alarming result that the NRI statistic calculated on a large test dataset using risk models derived from a training set is likely to be positive even when the new marker has no predictive information. A related theoretical example is provided in which an incorrect risk function that includes an uninformative marker is proven to erroneously yield a positive NRI. Some insight into this phenomenon is provided. Since large values for the NRI statistic may simply be due to use of poorly fitting risk models, we suggest caution in using the NRI as the basis for marker evaluation. Other measures of prediction performance improvement, such as measures derived from the receiver operating characteristic curve, the net benefit function, and the Brier score, cannot be large due to poorly fitting risk functions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Confidence distributions and hypothesis testing

Article Open access 29 March 2024

Eugenio Melilli & Piero Veronese

A random forest guided tour

Article 19 April 2016

Gérard Biau & Erwan Scornet

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Article 30 August 2016

Aki Vehtari, Andrew Gelman & Jonah Gabry

References

Baker SG, Cook NR, Vickers A, Kramer BS (2009) Using relative utility curves to evaluate risk prediction. J R Stat Soc Ser A Stat Soc 172(4):729–748
Article MathSciNet Google Scholar
Baker SG, Van Calster B, Steyerberg EW (2012) Evaluating a new marker for risk prediction using the test tradeoff: an update. Int J Biostat 8(1):1–37
Article MathSciNet Google Scholar
Gneiting T, Raftery AE (2007) Strictly proper scoring rules, prediction, and estimation. J Am Stat Assoc 102:359–378
Article MathSciNet MATH Google Scholar
Hastie T, Tibshirani R, Friedman JH (2001) The elements of statistical learning: data mining, inference, and prediction. Springer, New York
Book MATH Google Scholar
Hilden J (2014) Commentary: On NRI, IDI, and “good-looking” statistics with nothing underneath. Epidemiology 25(2):265–267
Article Google Scholar
Hilden J, Gerds TA (2013) A note on the evaluation of novel biomarkers: do not rely on integrated discrimination improvement and net reclassification index. Stat Med. doi:10.1002/sim.5804
Kerr KF, McClelland RL, Brown ER, Lumley T (2011) Evaluating the incremental value of new biomarkers with integrated discrimination improvement. Am J Epidemiol 174(3):364–374
Article Google Scholar
Kerr KF, Wang Z, Janes H, McClelland R, Psaty BM, Pepe MS (2014) Net reclassification indices for evaluating risk prediction instruments: a critical review. Epidemiology 25(1):114–121
Article Google Scholar
Li J, Jiang B, Fine JP (2013) Multicategory reclassification statistics for assessing improvements in diagnostic accuracy. Biostatistics 14(2):382–394
Article Google Scholar
McIntosh MW, Pepe MS (2002) Combining several screening tests: optimality of the risk score. Biometrics 58(3):657–664
Article MathSciNet MATH Google Scholar
Pencina M, D’Agostino R, D’Agostino R, Vasan R (2008) Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat Med 27(2):157–172
Article MathSciNet Google Scholar
Pencina MJ, D’Agostino RB, Steyerberg EW (2011) Extensions of net reclassification improvement calculations to measure usefulness of new biomarkers. Stat Med 30(1):11–21
Article MathSciNet Google Scholar
Pencina MJ, D’Agostino RB, Demler OV (2012) Novel metrics for evaluating improvement in discrimination: net reclassification and integrated discrimination improvement for normal variables and nested models. Stat Med 31(2):101–113
Article MathSciNet Google Scholar
Pepe M, Janes H (2013) Methods for evaluating prediction performance of biomarkers and tests. In: Lee ML, Gail M, Pfeiffer R, Satten G, Cai T, Gandy A (eds) Risk assessment and evaluation of predictions. Springer, Berlin, pp 107–142
Chapter Google Scholar
Pepe M, Kerr K, Longton G, Wang Z (2013a) Testing for improvement in prediction model performance. Stat Med 32(9):1467–1482
Article MathSciNet Google Scholar
Pepe MS, Janes H, Kerr KF, Psaty BM (2013b) Net reclassification index: a misleading measure of prediction improvement. University of Washington Department of Biostatistics Working Paper #394 . http://biostats.bepress.com/uwbiostat/paper394
Pfeiffer R, Gail M (2011) Two criteria for evaluating risk prediction models. Biometrics 67(3):1057–1065
Article MathSciNet MATH Google Scholar
Steyerberg EW (2010) Clinical prediction models: a practical approach to development, validation, and updating. Springer, New York
MATH Google Scholar
Thompson IM, Ankerst DP, Chi C, Lucia MS, Goodman PJ, Crowley JJ, Parnes HL, Coltman CA Jr (2005) Operating characteristics of prostate-specific antigen in men with an initial PSA level of 3.0 ng/ml or lower. JAMA 294(1):66–70
Article Google Scholar
Tzoulaki I, Liberopoulos G, Ioannidis JP (2009) Assessment of claims of improved prediction beyond the Framingham risk score. JAMA 302(21):2345–2352
Article Google Scholar
Vickers AJ, Cronin AM (2010) Traditional statistical methods for evaluating prediction models are uninformative as to clinical value: towards a decision analytic framework. In: Seminars in oncology, vol 37, p 31
Vickers A, Elkin E (2006) Decision curve analysis: a novel method for evaluating prediction models. Med Decis Making 26(6):565
Article Google Scholar
Vickers AJ, Pepe MS (2014) Does the net reclassification index help us evaluate models and markers? Ann Intern Med 160(2):136–137
Article Google Scholar
Vickers AJ, Cronin AM, Begg CB (2011) One statistical test is sufficient for assessing new predictive markers. BMC Med Res Methodol 11(1):13
Article Google Scholar

Download references

Acknowledgments

This work was supported in part by National Institutes of Health Grants R01 GM054438, U24 CA086368, and R01 CA152089.

Conflict of interest

None declared.

Author information

Authors and Affiliations

Biostatistics and Biomathematics Program, Fred Hutchinson Cancer Research Center, 1100 Fairview Ave N, M2B500, Seattle, WA, 98109, USA
Margaret S. Pepe & Jing Fan
The University of Texas MD Anderson Cancer Center, 1515 Holcombe Blvd., Houston, TX, 77030, USA
Ziding Feng
Department of Biostatistics, University of Copenhagen, Oster Farimsgade 5, Copenhagen, Denmark
Thomas Gerds & Jorgen Hilden

Authors

Margaret S. Pepe
View author publications
You can also search for this author in PubMed Google Scholar
Jing Fan
View author publications
You can also search for this author in PubMed Google Scholar
Ziding Feng
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Gerds
View author publications
You can also search for this author in PubMed Google Scholar
Jorgen Hilden
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Margaret S. Pepe.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 416 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pepe, M.S., Fan, J., Feng, Z. et al. The Net Reclassification Index (NRI): A Misleading Measure of Prediction Improvement Even with Independent Test Data Sets. Stat Biosci 7, 282–295 (2015). https://doi.org/10.1007/s12561-014-9118-0

Download citation

Received: 20 August 2013
Accepted: 23 July 2014
Published: 23 August 2014
Issue Date: October 2015
DOI: https://doi.org/10.1007/s12561-014-9118-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Net Reclassification Index (NRI): A Misleading Measure of Prediction Improvement Even with Independent Test Data Sets

Abstract

Access this article

Similar content being viewed by others

Confidence distributions and hypothesis testing

A random forest guided tour

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

References

Acknowledgments

Conflict of interest

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 416 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The Net Reclassification Index (NRI): A Misleading Measure of Prediction Improvement Even with Independent Test Data Sets

Abstract

Access this article

Similar content being viewed by others

Confidence distributions and hypothesis testing

A random forest guided tour

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

References

Acknowledgments

Conflict of interest

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 416 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation