Elsevier

Journal of Econometrics

Volume 100, Issue 2, February 2001, Pages 381-427
Journal of Econometrics

Benchmark priors for Bayesian model averaging

https://doi.org/10.1016/S0304-4076(00)00076-2Get rights and content

Abstract

In contrast to a posterior analysis given a particular sampling model, posterior model probabilities in the context of model uncertainty are typically rather sensitive to the specification of the prior. In particular, ‘diffuse’ priors on model-specific parameters can lead to quite unexpected consequences. Here we focus on the practically relevant situation where we need to entertain a (large) number of sampling models and we have (or wish to use) little or no subjective prior information. We aim at providing an ‘automatic’ or ‘benchmark’ prior structure that can be used in such cases. We focus on the normal linear regression model with uncertainty in the choice of regressors. We propose a partly non-informative prior structure related to a natural conjugate g-prior specification, where the amount of subjective information requested from the user is limited to the choice of a single scalar hyperparameter g0j. The consequences of different choices for g0j are examined. We investigate theoretical properties, such as consistency of the implied Bayesian procedure. Links with classical information criteria are provided. More importantly, we examine the finite sample implications of several choices of g0j in a simulation study. The use of the MC3 algorithm of Madigan and York (Int. Stat. Rev. 63 (1995) 215), combined with efficient coding in Fortran, makes it feasible to conduct large simulations. In addition to posterior criteria, we shall also compare the predictive performance of different priors. A classic example concerning the economics of crime will also be provided and contrasted with results in the literature. The main findings of the paper will lead us to propose a ‘benchmark’ prior specification in a linear regression context with model uncertainty.

Introduction

The issue of model uncertainty has permeated the econometrics and statistics literature for decades. An enormous volume of references can be cited (only a fraction of which is mentioned in this paper), and special issues of the Journal of Econometrics (1981, Vol. 16, No. 1) and Statistica Sinica (1997, Vol. 7, No. 2) are merely two examples of the amount of interest this topic has generated in the literature. From a Bayesian perspective, dealing with model uncertainty is conceptually straightforward: the model is treated as a further parameter which lies in the set of models entertained (the model space). A prior now needs to be specified for the parameters within each model as well as for the models themselves, and Bayesian inference can be conducted in the usual way, with one level (the prior on the model space) added to the hierarchy — see, e.g., Draper (1995) and the ensueing discussion. Unfortunately, the influence of the prior distribution, which is often straightforward to assess for inference given the model, is much harder to identify for posterior model probabilities. It is acknowledged — e.g., Kass and Raftery (1995), George (1999) — that posterior model probabilities can be quite sensitive to the specification of the prior distribution.

In this paper, we consider a particular instance of model uncertainty, namely uncertainty about which variables should be included in a linear regression problem with k available regressors. A model here will be identified by the set of regressors that it includes and, thus, the model space consists of 2k elements.1 Given the issue of sensitivity to the prior distribution alluded to above, the choice of prior is quite delicate, especially in the absence of substantial prior knowledge. Our aim here is to come up with a prior distribution that leads to sensible results, in the sense that data information dominates prior assumptions. Whereas we acknowledge the merits of using substantive prior information whenever available, we shall be concerned with providing the applied researcher with a ‘benchmark’ method for conducting inference in situations where incorporating such information into the analysis is deemed impossible, impractical or undesired. In addition, this provides a useful backdrop against which results arising from Bayesian analyses with informative priors could be contrasted.

We will focus on Bayesian model averaging (BMA), rather than on selecting a single model. BMA follows directly from the application of Bayes’ theorem in the hierarchical model described in the first paragraph, which implies mixing over models using the posterior model probabilities as weights. This is very reasonable as it allows for propagation of model uncertainty into the posterior distribution and leads to more sensible uncertainty bands. From a decision-theory point of view, Min and Zellner (1993) show that such mixing over models minimizes expected predictive squared error loss, provided the set of models under consideration is exhaustive. Raftery et al. (1997) state that BMA is optimal if predictive ability is measured by a logarithmic scoring rule. The latter result also follows from Bernardo (1979), who shows that the usual posterior distribution leads to maximal expected utility under a logarithmic proper utility function. Such a utility function was argued by Bernardo (1979) to be ‘often the more appropriate description for the preferences of a scientist facing an inference problem’. Thus, in the context of model uncertainty, the use of BMA follows from sensible utility considerations. This is the scenario that we will focus on. However, our results should also be useful under other utility structures that lead to decisions different from model averaging — e.g. model selection. This is because the posterior model probabilities will intervene in the evaluation of posterior expected utility. Thus, finding a prior distribution that leads to sensible results in the absence of substantive prior information is relevant in either setting.

Broadly speaking, we can distinguish three strands of related literature in the context of model uncertainty. Firstly, we mention the fundamentally oriented statistics and econometrics literature on prior elicitation and model selection or model averaging, such as exemplified in Box (1980), Zellner and Siow (1980), Draper (1995) and Phillips (1995) and the discussions of these papers. Secondly, there is the recent statistics literature on computational aspects. Markov chain Monte Carlo methods are proposed in George and McCulloch (1993), Madigan and York (1995), Geweke (1996) and Raftery et al. (1997), while Laplace approximations are found in Gelfand and Dey (1994) and Raftery (1996). Finally, there exists a large literature on information criteria, often in the context of time series, see, e.g., Hannan and Quinn (1979), Akaike (1981), Atkinson (1981), Chow (1981) and Foster and George (1994). This paper provides a unifying framework in which these three areas of research will be discussed.

In line with the bulk of the literature, the context of this paper will be normal linear regression with uncertainty in the choice of regressors. We abstract from any other issue of model specification. We present a prior structure that can reasonably be used in cases where we have (or wish to use) little prior information, partly based on improper priors for parameters that are common to all models, and partly on a g-prior structure as in Zellner (1986). The prior is not in the natural-conjugate class, but is such that marginal likelihoods can still be computed analytically. This allows for a simple treatment of potentially very large model spaces through Markov chain Monte Carlo model composition (MC3) as introduced in Madigan and York (1995). In contrast to some of the priors proposed in the literature, the prior we propose leads to valid conditioning in the posterior distribution (i.e., the latter can be interpreted as a conditional distribution given the observables) as it avoids dependence on the values of the response variable. The only hyperparameter left to elicit in our prior is a scalar g0j for each of the models considered. Theoretical properties, such as consistency of posterior model probabilities, are linked to functional dependencies of g0j on sample size and the number of regressors in the corresponding model. In addition (and perhaps more importantly), we conduct an empirical investigation through simulation. This will allow us to suggest specific choices for g0j to the applied user. As we have conducted a large simulation study, efficient coding was required. This code (in Fortran-77) has been made publicly available on the World Wide Web.2

Section 2 introduces the Bayesian model and the practice of Bayesian model averaging. The prior structure is explained in detail in Section 3, where expressions for Bayes factors are also given. The setup of the empirical simulation experiment is described in Section 4, while results are provided in Section 5. Section 6 presents an illustrative example using the economic model of crime from Ehrlich 1973, Ehrlich 1975, and Section 7 gives some concluding remarks and practical recommendations. The appendix presents results about asymptotic behaviour of Bayes factors.

Section snippets

The model and Bayesian model averaging

We consider n independent replications from a linear regression model with an intercept, say α, and k possible regression coefficients grouped in a k-dimensional vector β. We denote by Z the corresponding n×k design matrix and we assume that r(ιn:Z)=k+1, where r(·) indicates the rank of a matrix and ιn is an n-dimensional vector of 1's.

This gives rise to 2k possible sampling models, depending on whether we include or exclude each of the regressors. In line with the bulk of the literature in

Priors for model parameters and the corresponding Bayes factors

In this section, we present several priors — i.e., several choices for the density in (1.2) — and derive the expressions of the resulting Bayes factors. In the sequel of the paper, we shall examine the properties (both finite-sample and asymptotic) of the Bayes factors.

Convergence and implementation

The implementation of the simulation study described in the previous section will be conducted through the MC3 methodology mentioned in Section 1. This Metropolis algorithm generates a new candidate model, say Mj, from a Uniform distribution over the subset of M consisting of the current state of the chain, say Ms, and all models containing either one regressor more or one regressors less than Ms. The chain moves to Mj with probability min(1,Bjs), where Bjs is the Bayes factor in (2.16).

In

An empirical example: Crime data

The literature on the economics of crime has been critically influenced by the seminal work of Becker (1968) and the empirical analysis of Ehrlich 1973, Ehrlich 1975. The underlying idea is that criminal activities are the outcome of some rational economic decision process, and, as a result, the probability of punishment should act as a deterrent. Raftery et al. (1997) have used the Ehrlich data set corrected by Vandaele (1978). These are aggregate data for 47 U.S. states in 1960, which will be

Conclusions

We consider the normal linear regression model with uncertainty regarding the choice of regressors. The prior structure we have proposed in Section 3 leads to a valid interpretation of the posterior distribution as a conditional and only requires the choice of one scalar hyperparameter, called g0j. We make g0j a possible function of the sample size, n, the number of regressors in the model under consideration, kj, and the total number of available regressors, k. Theoretical results on

Acknowledgements

We thank Arnold Zellner, Dennis Lindley and two anonymous referees for their useful suggestions. Carmen Fernández gratefully acknowledges financial support from a Training and Mobility of Researchers grant awarded by the European Commission (ERBFMBICT # 961021). Carmen Fernández and Mark Steel were affiliated to CentER and the Department of Econometrics, Tilburg University, The Netherlands, and Eduardo Ley was at FEDEA, Madrid, Spain during the early stages of the work on this paper. Some of

References (64)

  • J.M. Bernardo

    Expected information as expected utility

    The Annals of Statistics

    (1979)
  • J.M. Bernardo

    A Bayesian analysis of classical hypothesis testing (with discussion)

  • G.E.P. Box

    Sampling and Bayes’ inference in scientific modelling and robustness (with discussion)

    Journal of the Royal Statistical Society, Series A

    (1980)
  • S. Chib et al.

    Understanding the Metropolis-Hastings algorithm

    The American Statistician

    (1995)
  • H. Chipman

    Bayesian variable selection with related predictors

    Canadian Journal of Statistics

    (1996)
  • M. Clyde et al.

    Prediction via orthogonalized model mixing

    Journal of the American Statistical Association

    (1996)
  • C. Cornwell et al.

    Estimating the economic model of crime with panel data

    Review of Economics and Statistics

    (1994)
  • A.P. Dawid

    Statistical theory: the prequential approach

    Journal of the Royal Statistical Society, Series A

    (1984)
  • A.P. Dawid

    Probability forecasting

  • D. Draper

    Assessment and propagation of model uncertainty (with discussion)

    Journal of the Royal Statistical Society, Series B

    (1995)
  • I. Ehrlich

    Participation in illegitimate activities: a theoretical and empirical investigation

    Journal of Political Economy

    (1973)
  • I. Ehrlich

    The deterrent effect of capital punishment: a question of life and death

    American Economic Review

    (1975)
  • D.P. Foster et al.

    The risk inflation criterion for multiple regression

    The Annals of Statistics

    (1994)
  • D.A. Freedman

    A note on screening regressions

    The American Statistician

    (1983)
  • S. Geisser et al.

    A predictive approach to model selection

    Journal of the American Statistical Association

    (1979)
  • A.E. Gelfand et al.

    Bayesian model choice: asymptotics and exact calculations

    Journal of the Royal Statistical Society, Series B

    (1994)
  • George, E.I., 1999. Bayesian model selection, Encyclopedia of Statistical Sciences Update, Vol. 3. (eds.) S. Kotz, C....
  • George, E.I., Foster, D.P., 1997. Calibration and empirical Bayes variable selection. Mimeo, University of Texas,...
  • E.I. George et al.

    Variable selection via Gibbs sampling

    Journal of the American Statistical Association

    (1993)
  • E.I. George et al.

    Approaches for Bayesian variable selection

    Statistica Sinica

    (1997)
  • J. Geweke

    Variable selection and model comparison in regression

  • I.J. Good

    Rational decisions

    Journal of the Royal Statistical Society, Series B

    (1952)
  • Cited by (0)

    View full text