Advantages and challenges of Bayesian networks in environmental modelling
Introduction
Bayesian networks (BNs), also called belief networks, Bayesian belief networks, Bayes nets, and sometimes also causal probabilistic networks, are an increasingly popular methods for modelling uncertain and complex domains such as ecosystems and environmental management. They emerge from artificial intelligence research and have been applied to a wide range of problems, ranging from text analysis (Dong and Agogino, 1997) to problems in medical diagnoses (Kahn et al., 1997) and the evaluation of scientific evidence (Garbolino and Taroni, 2002). They are also increasingly used in environmental modelling and management (e.g. Varis et al., 1990, Lee and Rieman, 1997, Varis, 1997, Reckhow, 1999, Marcot et al., 2001, Borsuk et al., 2004, Little et al., 2004, Wooldridge and Done, 2004, Bromley et al., 2005, Uusitalo et al., 2005).
Bayesian modelling techniques have several features that make them useful in many real-life data analysis and management questions. They provide a natural way to handle missing data, they allow combination of data with domain knowledge, they facilitate learning about causal relationships between variables, they provide a method for avoiding overfitting of data (Heckerman, 1995), they can show good prediction accuracy even with rather small sample sizes (Kontkanen et al., 1997a), and they can be easily combined with decision analytic tools to aid management (Kuikka et al., 1999, Marcot et al., 2001, Jensen, 2001, Ch. 4). On the other hand, their ability to deal with continuous data is limited (Jensen, 2001, p. 69), and such data generally needs to be discretized, which may cause certain difficulties. Bayesian networks are also a useful tool for expert elicitation and combining uncertain knowledge when used with care. Furthermore, building models forces us to think clearly about the subject, and articulate that thinking in the form of the model. This is often beneficial in and of itself (Marcot et al., 2001, Walters and Martell, 2004, p. 3).
Bayesian networks represent one branch of Bayesian modelling, the other major approach being hierarchical simulation-based modelling (Gilks et al., 1994, Gelman et al., 1995). In simulation-based modelling, the often analytically intractable probability distributions are estimated by generating samples from these distributions by simulation (Gelman et al., 1995), whereas in Bayesian networks, the probability distributions are generally expressed in discrete form and solved analytically. Both of these approaches share the idea of conditional dependence between variables and the updating of knowledge based on Bayes's theorem. Despite their similarity in aims and ideas, the practical modelling work is quite different, however, and the ideal method depends on the modelling needs in each case. Hierarchical modelling is especially suitable for cases with relatively abundant knowledge of complicated interactions between the model variables especially if this knowledge can be expressed with parametric distributions, and time-sliced models, while Bayesian networks are at their best with discrete domains and when reviewing and comparing different management choices or other courses of action. For many applications, either approach is appropriate.
In this paper I give an overview of the advantages and weak points of Bayesian networks, especially in relation to environmental research, and try to summarise the practical issues that often arise when applying BNs to the field. I review the current use of BNs in environmental research, and give some pointers to those who wish to apply BNs but do not know where to start. All along the way, I give references to books and articles that might prove useful in getting to know BNs.
Section snippets
Bayesian networks and their advantages
Bayesian networks are mathematical models presented graphically so that each variable is presented as a node with the directed links forming arcs between them. The information content of each variable is represented as one or several probability distributions. If a variable has no incoming arcs and is hence not dependent on any other variables in the model universe (i.e. has no parents), it has one probability distribution, and if it has parents, it has one probability distribution per each
Discretization of continuous variables
In environmental research as well as in many other fields, data and parameters often have continuous values. Bayesian networks can, however, deal with continuous variables in only a limited manner (Friedman and Goldszmidt, 1996, Jensen, 2001, p. 69). The usual solution is to discretize the variables and build the model over the discrete domain. There is a trade-off, however, as the discretization can only capture rough characteristics of the original distribution (Friedman and Goldszmidt, 1996
Environmental applications of Bayesian networks
Data analysis performed with BNs is quite rare in the field of environmental sciences. Varis and Kuikka (1997) built a BN into which they embedded computational and regression models describing Baltic salmon stock dynamics. Wooldridge and Done (2004) predicted coral bleaching in the Great Barrier Reef using a BN based on various data.
Models summarising and incorporating simulation models are more common. Lee and Rieman (1997) created a Bayesian network model for the assessment of fish
State of the art: how to apply Bayesian networks?
It is advisable to get familiar with the basics of the reasoning behind BNs before starting to build a model. Very good introductions to the topic are written by, e.g. Charniak (1991), Heckerman (1995), Heckerman and Wellman (1995) and Jensen (2001). They also provide a look at some modelling technique issues.
There are a number of software packages for building BNs on desktop computers, and many of those have been ported to several common operating systems. These packages are shortly introduced
Conclusions
Bayesian networks can be a useful addition to the toolkit of environmental scientists, especially if their work is related to environmental management. Explicit accounting for uncertainty can add substantial insight to many real-life problems, and the graphical representation of model structures and probability distributions is very useful in communicating theories and results to colleagues, students, and decision-makers. The readily available BN development packages are relatively advanced and
Acknowledgements
This work was funded by the Finnish Biological Interactions Graduate School and Jenny and Antti Wihuri Foundation. I wish to thank Prof. Sakari Kuikka for introducing me to Bayesian networks and Sanna Koulu, Dr. Samu Mäntyniemi, Dr. Jani Pellikka, and an anonymous reviewer for constructive comments on the manuscript.
References (51)
- et al.
The use of the decomposition principle in making judgments
Org. Behav. Hum. Perform.
(1975) - et al.
A Bayesian network of eutrophication models for synthesis, prediction, and uncertainty analysis
Ecol. Model.
(2004) - et al.
The use of Hugin® to develop Bayesian networks as an aid to integrated water resource planning
Environ. Model. Software
(2005) - et al.
Text analysis for constructing design representations
Artif. Intell. Eng.
(1997) - et al.
Evaluation of scientific evidence using Bayesian networks
Forensic Sci. Int.
(2002) - et al.
Construction of a Bayesian network for mammographic diagnosis of breast cancer
Computers Biol. Med.
(1997) - et al.
Information flow among fishing vessels modelled using a Bayesian network
Environ. Modell. Software
(2004) - et al.
A probabilistic and decision-theoretic approach to the management of infectious disease at the ICU
Artif. Intell. Med.
(2000) - et al.
Using Bayesian Belief Networks to evaluate fish and wildlife population viability under land management alternatives from an environmental impact statement
Forest Ecol. Manage.
(2001) Bayesian decision analysis for environmental and resource management
Environ. Modell. Software
(1997)