Calculating Confidence Intervals for the Number Needed to Treat
Introduction
The number needed to treat (NNT) has gained much attention in the past years as a useful way of reporting the results of randomized controlled trials with a binary outcome 1, 2, 3. Defined as the reciprocal of the absolute risk reduction (ARR), the number needed to treat is the estimated average number of patients needed to be treated to prevent an adverse outcome in one additional patient. A negative NNT is the estimated average number of patients needed to be treated with the new rather than the standard treatment for one additional patient to be harmed. While this measure is often better understood than risk ratios or risk reductions by clinicians and patients, the NNT has undesirable mathematical and statistical properties. The understanding of the confidence interval for NNT is not straightforward. However, an excellent explanation was recently given by Altman [4]. The mathematical and statistical properties of the NNT statistic are described in more detail by Lesaffre and Pledger [5].
The key to understanding the confidence interval for NNT is that principally the domain of NNT is the union of 1 to ∞ and −∞ to −1. The best value of NNT indicating the largest possible beneficial treatment effect is 1, the NNT value indicating no treatment effect (ARR = 0) is ±∞, and the worst NNT value indicating the largest possible harmful effect is −1. Thus, the result NNT = 10 with confidence limits 4 and −20 means that the two regions 4 to ∞ and −20 to −∞ form the confidence interval. Altman proposed to use two new abbreviations, namely number needed to treat for one patient to benefit (NNTB) or be harmed (NNTH) [4]. This concept avoids the awkward term “number needed to harm” (NNH), which is used, for example, in the journal Evidence-Based Medicine. The result of an estimated NNT with confidence interval can then be presented as NNTB = 10 (NNTB 4 to ∞ to NNTH 20) [4].
Altman recommended that a confidence interval should always be given when an NNT is reported as a study result [4]. However, the usual Wald method for calculating such confidence intervals is frequently inappropriate. By using examples from the literature and artificial examples, it is shown that the application of the Wilson score method [6] improves the calculation and presentation of confidence intervals for the number needed to treat.
Section snippets
Methods to calculate confidence intervals for nnt
Let π1 and π2 be the true probabilities (risks) of an adverse event in the control group (group 1) and the treatment group (group 2), respectively. The true ARR is the difference of the two risks π1 − π2. The true NNT is the reciprocal 1/(π1 − π2) of the true ARR. To estimate these measures a randomized clinical trial can be performed. Let n1 and n2 be the number of patients randomized in the control group and the treatment group, respectively, and let e1 and e2 be the number of patients having
Shortcomings of the simple wald method
Principally, the shortcomings of the Wald confidence intervals transmit from ARR to NNT. However, for interpretation the NNT scale has to be taken into account. In the following the confidence intervals for NNT based on Wilson scores are compared with the Wald confidence intervals by means of published and artificial examples. The published examples are estimated NNT values found in the journal Evidence-Based Medicine 18, 19, 20, 21. Here, we concentrate on the comparison of the confidence
Using nnt for equivalence trials
The possible aberrations of the simple Wald method to calculate confidence intervals for ARR and NNT are meaningful especially for equivalence trials [22]. To demonstrate equivalence in therapeutic clinical trials the use of confidence intervals with coverage probability of 95% or more is recommended [23]. Frequently, the objective of a study is to show that the new treatment is not inferior to the standard treatment. In such trials, one possibility to demonstrate equivalence between treatments
Discussion and conclusion
NNT has become a popular summary statistic to describe the absolute effect of a given treatment in comparison to a standard treatment or control. It was first introduced for use in randomized placebo-controlled clinical trials [24], then adopted as the primary outcome measure for systematic reviews such as meta-analyses [25], extended to the statistic “number needed to screen” to compare strategies for disease screening [26], and is now applied also in epidemiology to express the magnitude of
Acknowledgements
I thank Robert G. Newcombe for his valuable and helpful comments, which improved the paper considerably.
References (34)
- et al.
A note on the number needed to treat
Control Clin Trials
(1999) - et al.
The number needed to treatA clinically useful measure of treatment effect
BMJ
(1995) On some clinically useful measures of the effects of treatment
Evidence-Based Med
(1996)- et al.
The number needed to treatA clinically useful nomogram in its proper context
BMJ
(1996) Confidence intervals for the number needed to treat
BMJ
(1998)Interval estimation for the difference between independent proportionsComparison of eleven methods
Stat Med
(1998)Confidence limits made easyInterval estimation using a substitution method
Am J Epidemiol
(1998)- et al.
Comparative analysis of two rates
Stat Med
(1985) Asymptotic confidence intervals for the difference between binomial parameters for the use with small samples
Biometrics
(1987)A non-iterative accurate asymptotic confidence interval for the difference between two proportions
Stat Med
(1997)
Computer software that can calculate confidence intervals is now available (letter)
BMJ
Confidence intervals rather than P valuesEstimating rather than hypothesis testing
BMJ
StatXact 4 for Windows. Statistical Software for Exact Nonparametric Inference
Approximate is better than “exact” for interval estimation of binomial proportions
Am Statistn
Confidence intervals for a binomial proportion
Stat Med
Two-sided confidence intervals for the single proportionComparison of seven methods
Stat Med
SAS/IML User's Guide, Version 5 Edition
Cited by (163)
A comparison between psilocybin and esketamine in treatment-resistant depression using number needed to treat (NNT): A systematic review
2024, Journal of Affective DisordersNeurofeedback training in major depressive disorder: A systematic review of clinical efficacy, study quality and reporting practices
2021, Neuroscience and Biobehavioral ReviewsAn impending obituary for the primacy of P values in glomerulonephritis trial results?
2021, Kidney International