A comparative investigation of methods for logistic regression with separated or nearly separated data

Georg Heinze

doi:10.1002/sim.2687

A comparative investigation of methods for logistic regression with separated or nearly separated data

Stat Med. 2006 Dec 30;25(24):4216-26. doi: 10.1002/sim.2687.

Author

Georg Heinze¹

Affiliation

¹ Section of Clinical Biometrics, Core Unit for Medical Statistics and Informatics, Medical University of Vienna, Spitalgasse 23, A-1090 Vienna, Austria. georg.heinze@meduniwien.ac.at

PMID: 16955543
DOI: 10.1002/sim.2687

Abstract

In logistic regression analysis of small or sparse data sets, results obtained by classical maximum likelihood methods cannot be generally trusted. In such analyses it may even happen that the likelihood meets the convergence criteria while at least one parameter estimate diverges to +/-infinity. This situation has been termed 'separation', and it typically occurs whenever no events are observed in one of the two groups defined by a dichotomous covariate. More generally, separation is caused by a linear combination of continuous or dichotomous covariates that perfectly separates events from non-events. Separation implies infinite or zero maximum likelihood estimates of odds ratios, which are usually considered unrealistic. I provide some examples of separation and near-separation in clinical data sets and discuss some options to analyse such data, including exact logistic regression analysis and a penalized likelihood approach. Both methods supply finite point estimates in case of separation. Profile penalized likelihood confidence intervals for parameters show excellent behaviour in terms of coverage probability and provide higher power than exact confidence intervals. General advantages of the penalized likelihood approach are discussed.

Publication types

Comparative Study

MeSH terms

Amniotic Fluid / chemistry
Blood Sedimentation
Data Interpretation, Statistical*
Fibrinogen / physiology
Humans
Infant, Newborn
Infant, Premature / physiology
Likelihood Functions*
Logistic Models*
Lung Diseases / etiology
Odds Ratio
Ureaplasma urealyticum
Urinary Incontinence / psychology
Urinary Incontinence / therapy
gamma-Globulins / physiology

Substances

gamma-Globulins
Fibrinogen