Calculating Confidence Intervals for the Number Needed to Treat

doi:10.1016/S0197-2456(00)00134-3

Controlled Clinical Trials

Volume 22, Issue 2, February 2001, Pages 102-110

https://doi.org/10.1016/S0197-2456(00)00134-3 Get rights and content

Abstract

The number needed to treat (NNT) has gained much attention in the past years as a useful way of reporting the results of randomized controlled trials with a binary outcome. Defined as the reciprocal of the absolute risk reduction (ARR), NNT is the estimated average number of patients needed to be treated to prevent an adverse outcome in one additional patient. As with other estimated effect measures, it is important to document the uncertainty of the estimation by means of an appropriate confidence interval. Confidence intervals for NNT can be obtained by inverting and exchanging the confidence limits for the ARR provided that the NNT scale ranging from 1 through ∞ to −1 is taken into account. Unfortunately, the only method used in practice to calculate confidence intervals for ARR seems to be the simple Wald method, which yields too short confidence intervals in many cases. In this paper it is shown that the application of the Wilson score method improves the calculation and presentation of confidence intervals for the number needed to treat. Control Clin Trials 2001;22:102–110

Introduction

The number needed to treat (NNT) has gained much attention in the past years as a useful way of reporting the results of randomized controlled trials with a binary outcome 1, 2, 3. Defined as the reciprocal of the absolute risk reduction (ARR), the number needed to treat is the estimated average number of patients needed to be treated to prevent an adverse outcome in one additional patient. A negative NNT is the estimated average number of patients needed to be treated with the new rather than the standard treatment for one additional patient to be harmed. While this measure is often better understood than risk ratios or risk reductions by clinicians and patients, the NNT has undesirable mathematical and statistical properties. The understanding of the confidence interval for NNT is not straightforward. However, an excellent explanation was recently given by Altman [4]. The mathematical and statistical properties of the NNT statistic are described in more detail by Lesaffre and Pledger [5].

The key to understanding the confidence interval for NNT is that principally the domain of NNT is the union of 1 to ∞ and −∞ to −1. The best value of NNT indicating the largest possible beneficial treatment effect is 1, the NNT value indicating no treatment effect (ARR = 0) is ±∞, and the worst NNT value indicating the largest possible harmful effect is −1. Thus, the result NNT = 10 with confidence limits 4 and −20 means that the two regions 4 to ∞ and −20 to −∞ form the confidence interval. Altman proposed to use two new abbreviations, namely number needed to treat for one patient to benefit (NNTB) or be harmed (NNTH) [4]. This concept avoids the awkward term “number needed to harm” (NNH), which is used, for example, in the journal Evidence-Based Medicine. The result of an estimated NNT with confidence interval can then be presented as NNTB = 10 (NNTB 4 to ∞ to NNTH 20) [4].

Altman recommended that a confidence interval should always be given when an NNT is reported as a study result [4]. However, the usual Wald method for calculating such confidence intervals is frequently inappropriate. By using examples from the literature and artificial examples, it is shown that the application of the Wilson score method [6] improves the calculation and presentation of confidence intervals for the number needed to treat.

Section snippets

Methods to calculate confidence intervals for nnt

Let π₁ and π₂ be the true probabilities (risks) of an adverse event in the control group (group 1) and the treatment group (group 2), respectively. The true ARR is the difference of the two risks π₁ − π₂. The true NNT is the reciprocal 1/(π₁ − π₂) of the true ARR. To estimate these measures a randomized clinical trial can be performed. Let n₁ and n₂ be the number of patients randomized in the control group and the treatment group, respectively, and let e₁ and e₂ be the number of patients having

Shortcomings of the simple wald method

Principally, the shortcomings of the Wald confidence intervals transmit from ARR to NNT. However, for interpretation the NNT scale has to be taken into account. In the following the confidence intervals for NNT based on Wilson scores are compared with the Wald confidence intervals by means of published and artificial examples. The published examples are estimated NNT values found in the journal Evidence-Based Medicine 18, 19, 20, 21. Here, we concentrate on the comparison of the confidence

Using nnt for equivalence trials

The possible aberrations of the simple Wald method to calculate confidence intervals for ARR and NNT are meaningful especially for equivalence trials [22]. To demonstrate equivalence in therapeutic clinical trials the use of confidence intervals with coverage probability of 95% or more is recommended [23]. Frequently, the objective of a study is to show that the new treatment is not inferior to the standard treatment. In such trials, one possibility to demonstrate equivalence between treatments

Discussion and conclusion

NNT has become a popular summary statistic to describe the absolute effect of a given treatment in comparison to a standard treatment or control. It was first introduced for use in randomized placebo-controlled clinical trials [24], then adopted as the primary outcome measure for systematic reviews such as meta-analyses [25], extended to the statistic “number needed to screen” to compare strategies for disease screening [26], and is now applied also in epidemiology to express the magnitude of

Acknowledgements

I thank Robert G. Newcombe for his valuable and helpful comments, which improved the paper considerably.

References (34)

E. Lesaffre et al.
A note on the number needed to treat
Control Clin Trials
(1999)
R.J. Cook et al.
The number needed to treatA clinically useful measure of treatment effect
BMJ
(1995)
D.L. Sackett
On some clinically useful measures of the effects of treatment
Evidence-Based Med
(1996)
G. Chatellier et al.
The number needed to treatA clinically useful nomogram in its proper context
BMJ
(1996)
D.G. Altman
Confidence intervals for the number needed to treat
BMJ
(1998)
R.G. Newcombe
Interval estimation for the difference between independent proportionsComparison of eleven methods
Stat Med
(1998)
L.E. Daly
Confidence limits made easyInterval estimation using a substitution method
Am J Epidemiol
(1998)
O.S. Miettinen et al.
Comparative analysis of two rates
Stat Med
(1985)
S.L. Beal
Asymptotic confidence intervals for the difference between binomial parameters for the use with small samples
Biometrics
(1987)
S. Wallenstein
A non-iterative accurate asymptotic confidence interval for the difference between two proportions
Stat Med
(1997)

I.E. Buchan

Computer software that can calculate confidence intervals is now available (letter)

BMJ

(1995)

M.J. Gardner et al.

Confidence intervals rather than P valuesEstimating rather than hypothesis testing

BMJ

(1986)

C.R. Mehta et al.

StatXact 4 for Windows. Statistical Software for Exact Nonparametric Inference

(1999)

A. Agresti et al.

Approximate is better than “exact” for interval estimation of binomial proportions

Am Statistn

(1998)

S.E. Vollset

Confidence intervals for a binomial proportion

Stat Med

(1993)

R.G. Newcombe

Two-sided confidence intervals for the single proportionComparison of seven methods

Stat Med

(1998)

SAS/IML User's Guide, Version 5 Edition

(1985)

Cited by (163)

A comparison between psilocybin and esketamine in treatment-resistant depression using number needed to treat (NNT): A systematic review
2024, Journal of Affective Disorders
Inadequate outcomes with monoamine-based treatments in depressive disorders are common and provide the impetus for mechanistically-novel treatments. Esketamine is a proven treatment recently approved for adults with Treatment-Resistant Depression (TRD) while psilocybin is an investigational treatment. Translation of the clinical meaningfulness for these foregoing agents in adults with TRD is required. Herein we evaluate the Number Needed to Treat (NNT) and Harm (NNH) of esketamine and psilocybin in adults with TRD.
We conducted a systematic review of randomized controlled trials, comparing the clinical efficacy of oral psilocybin to the co-commencement of intranasal esketamine with an oral antidepressant in adults with TRD.
25 mg psilocybin had a significant reduction in depressive symptoms at 21-days post-dose, the NNT was 5 [95 % CI = 3.1, 18.5]. Psilocybin-induced nausea had a significant NNH = 5. Fixed-dosed esketamine at 56 mg and 84 mg had a significant effect at 28-days post-dose, (NNT of 7 [95 % CI_56mg = 3.5, 46.7], [95 % CI_84mg = 3.6, 142.2]). Esketamine-induced headache, nausea, dizziness, and dissociation had NNHs <10.
The preliminary results may only reflect a small portion of the patient population. These results require replication and longer term studies investigating maintenance therapy.
Relatively few pharmacologic agents are proven safe and effective in adults with TRD. NNT estimates for investigational psilocybin and esketamine in TRD indicate clinical meaningfulness. The NNH profile for both aforementioned agents is clinically acceptable. Our results underscore the clinical relevance of these treatment options in adults with TRD.
Efficacy of the Transdiagnostic Intervention for Sleep and Circadian Dysfunction for Depression Symptoms and Sleep-Wake Disruption in Older and Younger Adults: Secondary Age-Stratified Analysis of a Randomized Controlled Trial
2024, American Journal of Geriatric Psychiatry
Perform a secondary analysis examining the efficacy of the Transdiagnostic Intervention for Sleep and Circadian Dysfunction (TranS-C) for depression symptom responses, and explore changes in potential target mechanisms.
Secondary analysis of a randomized controlled trial with convenience age subsamples (younger (20–49 year; n = 52) versus and older (50–71 years; n = 35)).
Community mental health clinics.
Eighty-seven adults with serious mental illness.
TranS-C versus treatment as usual (TAU).
Outcomes were depression symptoms (Quick Inventory of Depression Symptoms), insomnia symptoms (Insomnia Severity Index), and objective sleep-wake rhythm measures (interdaily stability and relative amplitude).
Depression response rates (≥50% symptom reductions) were higher in the TranS-C (35.0%) than the TAU (8.8%) group 6-months postintervention (χ² = 10.3, p = 0.001). There was a medium effect of TranS-C versus TAU on depression symptoms 6-months postintervention (Cohen's d = −0.40, 95% confidence interval (CI): −0.81, 0.01). In both age groups, there were large treatment effects on insomnia symptoms post-treatment (Cohen's d >0.90). In the older subsample, there were additionally medium treatment effects on post-treatment interdaily stability (Cohen's d = 0.60, 95% CI: −0.11, 1.61). Post-treatment reductions in insomnia symptoms correlated with depression symptom reduction 6-months later in the younger subsample (Spearman rho = 0.59, n = 20, p = 0.008). In older adults, postintervention increases in interdaily stability correlated with depression symptom reductions 6-months later (Spearman rho = −0.52, n = 15, p = 0.049).
Confirmatory trials are needed, given the low age-specific sample sizes here, to determine if TranS -C's produces durable depression responses by increasing sleep-wake rhythm stability in older adults and improving insomnia symptoms in younger adults.
The authors evaluated preliminary efficacy of a behavioral intervention that targets sleep/sleep-wake rhythms, the Transdiagnostic Intervention for Sleep and Circadian Dysfunction (TranS-C), for depression symptoms in people with serious mental illness. TranS-C was associated with higher depression response rates than treatment as usual 6-months postintervention. The degree of depression symptom response 6-months later was related to the degree of treatment phase improvements in interdaily stability (in older adults) and reduction in insomnia severity (in younger adults).
A pragmatic nonpharmacologic intervention, the Transdiagnostic Intervention for Sleep and Circadian Dysfunction, has preliminary efficacy for improving sleep-wake factors and depression symptoms.
Are guided internet-based interventions for the indicated prevention of depression in green professions effective in the long run? Longitudinal analysis of the 6- and 12-month follow-up of a pragmatic randomized controlled trial (PROD-A)
2021, Internet Interventions
Evidence of long-term stability for positive mental health effects of internet-based interventions (IBIs) for depression prevention is still scarce. We evaluate long-term effectiveness of a depression prevention program in green professions (i.e. agriculture, horticulture, forestry).
This pragmatic RCT (n = 360) compares a tailored IBI program to enhanced treatment as usual (TAU+) in green professions with at least subthreshold depression (PHQ ≥ 5). Intervention group (IG) received one of six IBIs shown previously to efficaciously reduce depressive symptoms. We report 6- and 12-month follow-up measures for depression, mental health and intervention-related outcomes. Intention-to-treat and per-protocol regression analyses were conducted for each measurement point and complemented by latent growth modeling.
After 6 months, depression severity (β = −0.30, 95%-CI: −0.52; −0.07), insomnia (β = −0.22, 95%-CI: −0.41; −0.02), pain-associated disability (β = −0.26, 95%-CI: −0.48; −0.04) and quality of life (β = 0.29, 95%-CI: 0.13; 0.45) in IG were superior to TAU+. Onset of possible depression was not reduced. After 12 months, no intervention effects were found. Longitudinal modeling confirmed group effects attenuating over 12 months for most outcomes. After 12 months, 55.56% of IG had completed at least 80% of their IBI.
Stability of intervention effects along with intervention adherence was restricted. Measures enhancing long-term effectiveness of IBIs for depression health promotion are indicated in green professions.
German Clinical Trial Registration: DRKS00014000. Registered: 09 April 2018.
Neurofeedback training in major depressive disorder: A systematic review of clinical efficacy, study quality and reporting practices
2021, Neuroscience and Biobehavioral Reviews
Major depressive disorder (MDD) is the leading cause of disability worldwide. Neurofeedback training has been suggested as a potential additional treatment option for MDD patients not reaching remission from standard care (i.e., psychopharmacology and psychotherapy). Here we systematically reviewed neurofeedback studies employing electroencephalography, or functional magnetic resonance-based protocols in depressive patients. Of 585 initially screened studies, 24 were included in our final sample (N = 480 patients in experimental and N = 194 in the control groups completing the primary endpoint). We evaluated the clinical efficacy across studies and attempted to group studies according to the control condition categories currently used in the field that affect clinical outcomes in group comparisons. In most studies, MDD patients showed symptom improvement superior to the control group(s). However, most articles did not comply with the most stringent study quality and reporting practices. We conclude with recommendations on best practices for experimental designs and reporting standards for neurofeedback training.
An impending obituary for the primacy of P values in glomerulonephritis trial results?
2021, Kidney International
Should patients with hormone receptor–positive, HER2–negative breast cancer and one or two positive sentinel nodes undergo axillary dissection to determine candidacy for adjuvant abemaciclib?
2024, Cancer

View all citing articles on Scopus

View full text

Calculating Confidence Intervals for the Number Needed to Treat

Abstract

Introduction

Section snippets

Methods to calculate confidence intervals for nnt

Shortcomings of the simple wald method

Using nnt for equivalence trials

Discussion and conclusion

Acknowledgements

Control Clin Trials

The number needed to treatA clinically useful measure of treatment effect

BMJ

On some clinically useful measures of the effects of treatment

Evidence-Based Med

The number needed to treatA clinically useful nomogram in its proper context

BMJ

Confidence intervals for the number needed to treat

BMJ

Interval estimation for the difference between independent proportionsComparison of eleven methods

Stat Med

Confidence limits made easyInterval estimation using a substitution method

Am J Epidemiol

Comparative analysis of two rates

Stat Med

Asymptotic confidence intervals for the difference between binomial parameters for the use with small samples

Biometrics

A non-iterative accurate asymptotic confidence interval for the difference between two proportions

Stat Med

Computer software that can calculate confidence intervals is now available (letter)

BMJ

Confidence intervals rather than P valuesEstimating rather than hypothesis testing

BMJ

StatXact 4 for Windows. Statistical Software for Exact Nonparametric Inference

Approximate is better than “exact” for interval estimation of binomial proportions

Am Statistn

Confidence intervals for a binomial proportion

Stat Med

Two-sided confidence intervals for the single proportionComparison of seven methods

Stat Med

SAS/IML User's Guide, Version 5 Edition