Extension of Latin hypercube samples with correlated variables

https://doi.org/10.1016/j.ress.2007.04.005Get rights and content

Abstract

A procedure for extending the size of a Latin hypercube sample (LHS) with rank correlated variables is described and illustrated. The extension procedure starts with an LHS of size m and associated rank correlation matrix C and constructs a new LHS of size 2m that contains the elements of the original LHS and has a rank correlation matrix that is close to the original rank correlation matrix C. The procedure is intended for use in conjunction with uncertainty and sensitivity analysis of computationally demanding models in which it is important to make efficient use of a necessarily limited number of model evaluations.

Introduction

The evaluation of the uncertainty associated with analysis outcomes is now widely recognized as an important part of any modeling effort [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11]. A number of approaches to such evaluations are in use, including differential analysis [12], [13], [14], [15], [16], [17], response surface methodology [18], [19], [20], [21], [22], [23], [24], [25], [26], variance decomposition procedures [27], [28], [29], [30], [31], and Monte Carlo (i.e., sampling-based) procedures [32], [33], [34], [35], [36], [37], [38], [39], [40], [41], [42]. Additional information is available in a number of reviews [43], [44], [45], [46], [47], [48], [49], [50], [51]. Monte Carlo analysis employing Latin hypercube sampling [52], [53] is one of the most popular and effective approaches for the evaluation of the uncertainty associated with analysis outcomes and is the focus of this presentation.

Conceptually, an analysis can be formally represented by a function of the formy=f(x),wherex=[x1,x2,,xn]is a vector of analysis inputs andy=[y1,y2,,yp]is a vector of analysis results. In turn, uncertainty with respect to the appropriate values to use for the elements of x leads to uncertainty with respect to the values for the elements of y. Most analyses use probability to characterize the uncertainty associated with the elements of x and hence the uncertainty associated with the elements of y. In particular, a sequence of probability distributionsD1,D2,,Dnis used to characterize the uncertainty associated with the elements of x, where the distribution Dj characterizes the uncertainty associated with the element xj of x. The definition of the preceding distributions is often accomplished through an expert review process and can be accompanied by the specification of correlations and other restrictions involving the interplay of the possible values for the elements of x [54], [55], [56], [57], [58], [59], [60], [61], [62], [63], [64], [65], [66], [67], [68], [69].

In a Monte Carlo (i.e., sampling-based) analysis, a samplexi=[xi1,xi2,,xin],i=1,2,,m,is generated from the possible values for x in consistency with the distributions indicated in Eq. (1.4) and any associated restrictions. In turn, the evaluationsyi=f(xi),i=1,2,,m,create a mapping[xi,yi],i=1,2,,m,between analysis inputs and analysis outcomes that forms the basis for uncertainty analysis (i.e., the determination of the uncertainty in the elements of y that derives from uncertainty in the elements of x) and sensitivity analysis (i.e., the determination of how the uncertainty in individual elements of x contributes to the uncertainty in elements of y).

As previously indicated, Latin hypercube sampling is a very popular method for the generation of the sample indicated in Eq. (1.5). Further, this generation is often performed in conjunction with a procedure introduced by Iman and Conover to induce a desired rank correlation structure on the resultant sample [70], [71]. As a result of this popularity, the original paper introducing Latin hypercube sampling was recently declared a Technometrics classic in experimental design [72]. The effectiveness of Latin hypercube sampling, and hence the cause of its popularity, derives from the fact that it provides a dense stratification over the range of each uncertain variable with a relatively small sample size while preserving the desirable probabilistic features of simple random sampling. More specifically, Latin hypercube sampling combines the desirable features of simple random sampling with the desirable features of a multilevel, highly fractionated fractional factorial design. Latin hypercube sampling accomplishes this by using a highly structured, randomized procedure to generate the sample indicated in Eq. (1.5) in consistency with the distributions indicated in Eq. (1.4).

A drawback to Latin hypercube sampling is that its highly structured form makes it difficult to increase the size of an already generated sample while simultaneously preserving the stratification properties that make Latin hypercube sampling so effective. Unlike simple random sampling, the size of a Latin hypercube sample (LHS) cannot be increased simply by generating additional sample elements as the new sample containing the original LHS and the additional sample elements will no longer have the structure of an LHS. For the new sample to also be an LHS, the additional sample elements must be generated with a procedure that takes into account the existing LHS that is being increased in size and the definition of Latin hypercube sampling.

The purpose of this presentation is to describe a procedure for the extension of the size of an LHS that results in a new LHS with a correlation structure close to that of the original LHS. The basic idea is to start with an LHSxi=[xi1,xi2,,xin],i=1,2,,m,of size m and then to generate a second samplex˜i=[x˜i1,x˜i2,,x˜in],i=1,2,,m,of size m such thatxi={xifori=1,2,,mx˜i-mfori=m+1,m+2,,2mis an LHS of size 2m and also such that the correlation structures associated with the original LHS in Eq. (1.8) and the extended LHS in Eq. (1.10) are similar. A related extension technique for LHSs has been developed by Tong [73] but does not consider correlated variables. Extensions to other integer multiples of the original sample size are also possible.

There are at least three reasons why such extensions of the size of an LHS might be desirable. First, an analysis could have been performed with a sample size that was subsequently determined to be too small. The extension would permit the use of a larger LHS without the loss of any of the already performed, and possibly quite expensive, calculations. Second, the implementation of the Iman and Conover procedure to induce a desired rank correlation structure on an LHS of size m requires the inversion of an m×m matrix. This inversion can be computationally demanding when a large sample is to be generated. The presented extension procedure provides a way to generate an LHS of size 2m with a specified correlation structure at a computational expense that is approximately equal to that of generating two LHSs of size m with the desired correlation structure. Third, the extension procedure provides a way to perform replicated Latin hypercube sampling [74], [75] to test the stability of results that enhances the quality of results obtained when the replicates are pooled.

Section snippets

Definition of Latin hypercube sampling

Latin hypercube sampling operates in the following manner to generate a sample of size m from n variables with the distributions D1, D2,…,Dn indicated in Eq. (1.4). The range Xj of each variable xj is divided into m contiguous intervalsXij,i=1,2,,m,of equal probability in consistency with the corresponding distribution Dj. A value for the variable xj is selected at random from the interval Xij in consistency with the distribution Dj for i=1,2,…,m and j=1,2,…,n. Then, the m values for x1 are

Extension algorithm

The extension algorithm starts with an LHS of size m of the form indicated in Eq. (2.4) and an associated rank correlation matrix D1 as indicated in Eq. (2.6) generated with the Iman and Conover procedure so that D1 is close to the target correlation matrix C. The problem under consideration is how to extend this sample to an LHS of size 2m with a rank correlation matrix D that is again close to C. This extension can be accomplished by application of the following algorithm:

Step 1. Let kj be a

Illustration of extension algorithm

The extension algorithm is illustrated for the generation of LHSs fromx=[x1,x2],with (i) x1 having a triangular distribution on [0, 1] with mode at 0.5, (ii) x2 having a triangular distribution on [1,10] with mode at 7.0, and (iii) x1 and x2 having a rank correlation of −0.7. Thus, n=2 in Eq. (1.2); the distributions D1 and D2 in Eq. (1.4) correspond to triangular distributions; andC=[1.0-0.7-0.71.0]is the correlation matrix in Eq. (2.5). The extension of an LHS of size m=10 to an LHS of size 2m

Correlation

The extension algorithm described in Section 3 and illustrated in Section 4 starts with an initial LHS of size m with a rank correlation matrix D1, generates a second LHS of size m with a rank correlation matrix D2, and then constructs an LHS of size 2m that includes the elements of the first LHS and has a rank correlation matrix D close to (D1+D2)/2. This section demonstrates that the resultant rank correlation matrix D is indeed close to (D1+D2)/2.

This demonstration is based on considering

Discussion

Latin hypercube sampling is the preferred sampling procedure for the assessment of the implications of epistemic uncertainty in complex analyses because of its probabilistic character (i.e., each sample element has a weight equal to the reciprocal of the sample size that can be used in estimating probability-based quantities such as means, standard deviations, distribution functions, and standardized regression coefficients) and efficient stratification properties (i.e., a dense stratification

Acknowledgments

Work was performed for Sandia National Laboratories (SNL), which is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy's National Security Administration under contract DE-AC04-94AL-85000. Review at SNL was provided by L. Swiler and R. Jarek. Editorial support was provided by F. Puffer and J. Ripple of Tech Reps, a division of Ktech Corporation.

References (90)

  • M.C. Thorne

    The use of expert opinion in formulating conceptual models of underground disposal systems and the treatment of associated bias

    Reliab Eng Syst Saf

    (1993)
  • S. Chhibber et al.

    A taxonomy of issues related to the use of expert judgments in probabilistic safety studies

    Reliab Eng Syst Saf

    (1992)
  • M.C. Thorne et al.

    A review of expert judgement techniques with reference to nuclear safety

    Prog Nucl Saf

    (1992)
  • C. Tong

    Refinement strategies for stratified sampling methods

    Reliab Eng Syst Saf

    (2006)
  • J.C. Helton et al.

    Characterization of subjective uncertainty in the 1996 performance assessment for the Waste Isolation Pilot Plant

    Reliab Eng Syst Saf

    (2000)
  • R.J. Breeding et al.

    Summary description of the methods used in the probabilistic risk assessments for NUREG-1150

    Nucl Eng Des

    (1992)
  • R.J. Breeding et al.

    The NUREG-1150 probabilistic risk assessment for the Surry Nuclear Power Station

    Nucl Eng Des

    (1992)
  • A.C. Payne et al.

    The NUREG-1150 probabilistic risk assessment for the Peach Bottom Atomic Power Station

    Nucl Eng Des

    (1992)
  • T.D. Brown et al.

    The NUREG-1150 probabilistic risk assessment for the Grand Gulf Nuclear Station

    Nucl Eng Des

    (1992)
  • J.C. Helton et al.

    Robustness of an uncertainty and sensitivity analysis of early exposure results with the MACCS reactor accident consequence model

    Reliab Eng Syst Saf

    (1995)
  • J.C. Helton et al.

    A comparison of uncertainty and sensitivity analysis results obtained with random and Latin hypercube sampling

    Reliab Eng Syst Saf

    (2005)
  • M.A. Christie et al.

    Error analysis and simulations of complex phenomena

    Los Alamos Sci

    (2005)
  • D.H. Sharp et al.

    QMU and nuclear weapons certification: what's under the hood?

    Los Alamos Sci

    (2003)
  • R.L. Wagner

    Science, uncertainty and risk: the problem of complex phenomena

    APS News

    (2003)
  • Guiding principles for Monte Carlo analysis, EPA/630/R-97/001

    (1997)
  • NCRP (National Council on Radiation Protection and Measurements). A guide for uncertainty analysis in dose and risk...
  • Science and judgment in risk assessment

    (1994)
  • Issues in risk assessment

    (1993)
  • An SAB report: multi-media risk assessment for radon, review of uncertainty analysis of risks associated with exposure to radon, EPA-SAB-RAC-93-014

    (1993)
  • IAEA (International Atomic Energy Agency), 1989. Evaluating the reliability of predictions made using environmental...
  • M.B. Beck

    Water-quality modeling: a review of the analysis of uncertainty

    Water Resourc Res

    (1987)
  • D.G. Cacuci
    (2003)
  • T. Turányi

    Sensitivity analysis of complex kinetic systems. Tools and applications

    J Math Chem

    (1990)
  • H. Rabitz et al.

    Sensitivity analysis in chemical kinetics

  • P.M. Frank

    Introduction to system sensitivity theory

    (1978)
  • R. Tomovic et al.

    General sensitivity theory

    (1972)
  • R.H. Myers et al.

    Response surface methodology: a retrospective and literature review

    J Qual Technol

    (2004)
  • R.H. Myers

    Response surface methodology—current status and future directions

    J Qual Technol

    (1999)
  • T.H. Andres

    Sampling methods and sensitivity analysis for large parameter sets

    J Stat Comput Simulat

    (1997)
  • J.P.C. Kleijnen

    Sensitivity analysis and related analyses: a review of some statistical techniques

    J Stat Comput Simulat

    (1997)
  • J. Sacks et al.

    Design and analysis of computer experiments

    Stat Sci

    (1989)
  • R.H. Morton

    Response surface methodology

    Math Sci

    (1983)
  • R. Mead et al.

    A review of response surface methodology from a biometric viewpoint

    Biometrics

    (1975)
  • R.H. Myers

    Response surface methodology

    (1971)
  • Cited by (87)

    • Integrated risk assessment and decision support for water-related disasters

      2023, Hydro-Meteorological Hazards, Risks, and Disasters
    • A labelling strategy to define airtightness performance ranges of naturally ventilated dwellings: An application in southern Europe

      2022, Energy and Buildings
      Citation Excerpt :

      The LHS method generated 300 dwellings, corresponding to different combinations of the continuous variables. This method allows inputting a correlation matrix alongside the original data to preserve the existing significant correlations in the output sample [57]. The Spearman correlation coefficient addressed the correlation between the variables.

    • On the usefulness of gradient information in surrogate modeling: Application to uncertainty propagation in composite material models

      2020, Probabilistic Engineering Mechanics
      Citation Excerpt :

      The current work focuses on sequential space filling designs, which are natural candidates for systematic surrogate modeling error convergence studies. These include Hierarchical Latin Hypercube Sampling (HLHS) [19,20], Refined Latinized Stratified Sampling (RLSS) [21] and Scrambled Sobol (ScSo) [22] sequences. There are a few gradient-informed surrogate algorithms based on unstructured sampling points, namely gradient-enhanced kriging [23], gradient-assisted radial basis function [10,11] and gradient-enhanced radial basis function [24] among others, which have been used for efficient optimization in computational fluid dynamics (CFD) applications.

    View all citing articles on Scopus
    View full text