Extension of Latin hypercube samples with correlated variables

doi:10.1016/j.ress.2007.04.005

Reliability Engineering & System Safety

Volume 93, Issue 7, July 2008, Pages 1047-1059

https://doi.org/10.1016/j.ress.2007.04.005 Get rights and content

Abstract

A procedure for extending the size of a Latin hypercube sample (LHS) with rank correlated variables is described and illustrated. The extension procedure starts with an LHS of size m and associated rank correlation matrix C and constructs a new LHS of size 2m that contains the elements of the original LHS and has a rank correlation matrix that is close to the original rank correlation matrix C. The procedure is intended for use in conjunction with uncertainty and sensitivity analysis of computationally demanding models in which it is important to make efficient use of a necessarily limited number of model evaluations.

Introduction

The evaluation of the uncertainty associated with analysis outcomes is now widely recognized as an important part of any modeling effort [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11]. A number of approaches to such evaluations are in use, including differential analysis [12], [13], [14], [15], [16], [17], response surface methodology [18], [19], [20], [21], [22], [23], [24], [25], [26], variance decomposition procedures [27], [28], [29], [30], [31], and Monte Carlo (i.e., sampling-based) procedures [32], [33], [34], [35], [36], [37], [38], [39], [40], [41], [42]. Additional information is available in a number of reviews [43], [44], [45], [46], [47], [48], [49], [50], [51]. Monte Carlo analysis employing Latin hypercube sampling [52], [53] is one of the most popular and effective approaches for the evaluation of the uncertainty associated with analysis outcomes and is the focus of this presentation.

Conceptually, an analysis can be formally represented by a function of the form $y = f (x),$ where $x = [x_{1}, x_{2}, \dots, x_{n}]$ is a vector of analysis inputs and $y = [y_{1}, y_{2}, \dots, y_{p}]$ is a vector of analysis results. In turn, uncertainty with respect to the appropriate values to use for the elements of x leads to uncertainty with respect to the values for the elements of y. Most analyses use probability to characterize the uncertainty associated with the elements of x and hence the uncertainty associated with the elements of y. In particular, a sequence of probability distributions $D_{1}, D_{2}, \dots, D_{n}$ is used to characterize the uncertainty associated with the elements of x, where the distribution D_j characterizes the uncertainty associated with the element x_j of x. The definition of the preceding distributions is often accomplished through an expert review process and can be accompanied by the specification of correlations and other restrictions involving the interplay of the possible values for the elements of x [54], [55], [56], [57], [58], [59], [60], [61], [62], [63], [64], [65], [66], [67], [68], [69].

In a Monte Carlo (i.e., sampling-based) analysis, a sample $x_{i} = [x_{i 1}, x_{i 2}, \dots, x_{in}], i = 1, 2, \dots, m,$ is generated from the possible values for x in consistency with the distributions indicated in Eq. (1.4) and any associated restrictions. In turn, the evaluations $y_{i} = f (x_{i}), i = 1, 2, \dots, m,$ create a mapping $[x_{i}, y_{i}], i = 1, 2, \dots, m,$ between analysis inputs and analysis outcomes that forms the basis for uncertainty analysis (i.e., the determination of the uncertainty in the elements of y that derives from uncertainty in the elements of x) and sensitivity analysis (i.e., the determination of how the uncertainty in individual elements of x contributes to the uncertainty in elements of y).

As previously indicated, Latin hypercube sampling is a very popular method for the generation of the sample indicated in Eq. (1.5). Further, this generation is often performed in conjunction with a procedure introduced by Iman and Conover to induce a desired rank correlation structure on the resultant sample [70], [71]. As a result of this popularity, the original paper introducing Latin hypercube sampling was recently declared a Technometrics classic in experimental design [72]. The effectiveness of Latin hypercube sampling, and hence the cause of its popularity, derives from the fact that it provides a dense stratification over the range of each uncertain variable with a relatively small sample size while preserving the desirable probabilistic features of simple random sampling. More specifically, Latin hypercube sampling combines the desirable features of simple random sampling with the desirable features of a multilevel, highly fractionated fractional factorial design. Latin hypercube sampling accomplishes this by using a highly structured, randomized procedure to generate the sample indicated in Eq. (1.5) in consistency with the distributions indicated in Eq. (1.4).

A drawback to Latin hypercube sampling is that its highly structured form makes it difficult to increase the size of an already generated sample while simultaneously preserving the stratification properties that make Latin hypercube sampling so effective. Unlike simple random sampling, the size of a Latin hypercube sample (LHS) cannot be increased simply by generating additional sample elements as the new sample containing the original LHS and the additional sample elements will no longer have the structure of an LHS. For the new sample to also be an LHS, the additional sample elements must be generated with a procedure that takes into account the existing LHS that is being increased in size and the definition of Latin hypercube sampling.

The purpose of this presentation is to describe a procedure for the extension of the size of an LHS that results in a new LHS with a correlation structure close to that of the original LHS. The basic idea is to start with an LHS $x_{i} = [x_{i 1}, x_{i 2}, \dots, x_{in}], i = 1, 2, \dots, m,$ of size m and then to generate a second sample ${\tilde{x}}_{i} = [{\tilde{x}}_{i 1}, {\tilde{x}}_{i 2}, \dots, {\tilde{x}}_{in}], i = 1, 2, \dots, m,$ of size m such that $x_{i} = {\begin{matrix} x_{i} & for i = 1, 2, \dots, m \\ {\tilde{x}}_{i - m} & for i = m + 1, m + 2, \dots, 2 m \end{matrix}$ is an LHS of size 2m and also such that the correlation structures associated with the original LHS in Eq. (1.8) and the extended LHS in Eq. (1.10) are similar. A related extension technique for LHSs has been developed by Tong [73] but does not consider correlated variables. Extensions to other integer multiples of the original sample size are also possible.

There are at least three reasons why such extensions of the size of an LHS might be desirable. First, an analysis could have been performed with a sample size that was subsequently determined to be too small. The extension would permit the use of a larger LHS without the loss of any of the already performed, and possibly quite expensive, calculations. Second, the implementation of the Iman and Conover procedure to induce a desired rank correlation structure on an LHS of size m requires the inversion of an m×m matrix. This inversion can be computationally demanding when a large sample is to be generated. The presented extension procedure provides a way to generate an LHS of size 2m with a specified correlation structure at a computational expense that is approximately equal to that of generating two LHSs of size m with the desired correlation structure. Third, the extension procedure provides a way to perform replicated Latin hypercube sampling [74], [75] to test the stability of results that enhances the quality of results obtained when the replicates are pooled.

Section snippets

Definition of Latin hypercube sampling

Latin hypercube sampling operates in the following manner to generate a sample of size m from n variables with the distributions D₁, D₂,…,D_n indicated in Eq. (1.4). The range $X_{j}$ of each variable x_j is divided into m contiguous intervals $X_{ij}, i = 1, 2, \dots, m,$ of equal probability in consistency with the corresponding distribution D_j. A value for the variable x_j is selected at random from the interval $X_{ij}$ in consistency with the distribution D_j for i=1,2,…,m and j=1,2,…,n. Then, the m values for x₁ are

Extension algorithm

The extension algorithm starts with an LHS of size m of the form indicated in Eq. (2.4) and an associated rank correlation matrix D₁ as indicated in Eq. (2.6) generated with the Iman and Conover procedure so that D₁ is close to the target correlation matrix C. The problem under consideration is how to extend this sample to an LHS of size 2m with a rank correlation matrix D that is again close to C. This extension can be accomplished by application of the following algorithm:

Step 1. Let k_j be a

Illustration of extension algorithm

The extension algorithm is illustrated for the generation of LHSs from $x = [x_{1}, x_{2}],$ with (i) x₁ having a triangular distribution on [0, 1] with mode at 0.5, (ii) x₂ having a triangular distribution on [1,10] with mode at 7.0, and (iii) x₁ and x₂ having a rank correlation of −0.7. Thus, n=2 in Eq. (1.2); the distributions D₁ and D₂ in Eq. (1.4) correspond to triangular distributions; and $C = [\begin{matrix} 1.0 & - 0.7 \\ - 0.7 & 1.0 \end{matrix}]$ is the correlation matrix in Eq. (2.5). The extension of an LHS of size m=10 to an LHS of size 2m

Correlation

The extension algorithm described in Section 3 and illustrated in Section 4 starts with an initial LHS of size m with a rank correlation matrix D₁, generates a second LHS of size m with a rank correlation matrix D₂, and then constructs an LHS of size 2m that includes the elements of the first LHS and has a rank correlation matrix D close to (D₁+D₂)/2. This section demonstrates that the resultant rank correlation matrix D is indeed close to (D₁+D₂)/2.

This demonstration is based on considering

Discussion

Latin hypercube sampling is the preferred sampling procedure for the assessment of the implications of epistemic uncertainty in complex analyses because of its probabilistic character (i.e., each sample element has a weight equal to the reciprocal of the sample size that can be used in estimating probability-based quantities such as means, standard deviations, distribution functions, and standardized regression coefficients) and efficient stratification properties (i.e., a dense stratification

Acknowledgments

Work was performed for Sandia National Laboratories (SNL), which is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy's National Security Administration under contract DE-AC04-94AL-85000. Review at SNL was provided by L. Swiler and R. Jarek. Editorial support was provided by F. Puffer and J. Ripple of Tech Reps, a division of Ktech Corporation.

References (90)

W.L. Oberkampf et al.
Error and uncertainty in modeling and simulation
Reliab Eng Syst Saf
(2002)
J.P.C. Kleijnen
Sensitivity analysis of simulation experiments: regression analysis and statistical design
Math Comput Simulat
(1992)
R.I. Cukier et al.
Nonlinear sensitivity analysis of multiparameter model systems
J Comput Phys
(1978)
J.C. Helton et al.
Survey of sampling-based methods for uncertainty and sensitivity analysis
Reliab Eng Syst Saf
(2006)
J.P.C. Kleijnen et al.
Statistical analyses of scatterplots to identify important factors in large-scale simulations, 1: review and comparison of techniques
Reliab Eng Syst Saf
(1999)
A. Saltelli et al.
Non-parametric statistics in sensitivity analysis for model output. A comparison of selected techniques
Reliab Eng Syst Saf
(1990)
J.C. Helton
Uncertainty and sensitivity analysis techniques for use in performance assessment for radioactive waste disposal
Reliab Eng Syst Saf
(1993)
J.C. Helton et al.
Latin hypercube sampling and the propagation of uncertainty in analyses of complex systems
Reliab Eng Syst Saf
(2003)
N.O. Siu et al.
Bayesian parameter estimation in probabilistic risk assessment
Reliab Eng Syst Saf
(1998)
J.S. Evans et al.
Use of probabilistic expert judgement in uncertainty analysis of carcinogenic potency
Regul Toxicol Pharmacol
(1994)

Guiding principles for Monte Carlo analysis, EPA/630/R-97/001

(1997)

NCRP (National Council on Radiation Protection and Measurements). A guide for uncertainty analysis in dose and risk...

Science and judgment in risk assessment

(1994)

Issues in risk assessment

(1993)

An SAB report: multi-media risk assessment for radon, review of uncertainty analysis of risks associated with exposure to radon, EPA-SAB-RAC-93-014

(1993)

IAEA (International Atomic Energy Agency), 1989. Evaluating the reliability of predictions made using environmental...

M.B. Beck

Water-quality modeling: a review of the analysis of uncertainty

Water Resourc Res

(1987)

D.G. Cacuci

(2003)

T. Turányi

Sensitivity analysis of complex kinetic systems. Tools and applications

J Math Chem

(1990)

H. Rabitz et al.

Sensitivity analysis in chemical kinetics

P.M. Frank

Introduction to system sensitivity theory

(1978)

R. Tomovic et al.

General sensitivity theory

(1972)

R.H. Myers et al.

Response surface methodology: a retrospective and literature review

J Qual Technol

(2004)

R.H. Myers

Response surface methodology—current status and future directions

J Qual Technol

(1999)

T.H. Andres

Sampling methods and sensitivity analysis for large parameter sets

J Stat Comput Simulat

(1997)

J.P.C. Kleijnen

Sensitivity analysis and related analyses: a review of some statistical techniques

J Stat Comput Simulat

(1997)

J. Sacks et al.

Design and analysis of computer experiments

Stat Sci

(1989)

R.H. Morton

Response surface methodology

Math Sci

(1983)

R. Mead et al.

A review of response surface methodology from a biometric viewpoint

Biometrics

(1975)

R.H. Myers

Response surface methodology

(1971)

Cited by (87)

A new approach for gridded risk assessment of rainfall-triggered flood and landslide hazards over a large region based on coupled flood-landslide modelling and ensemble simulation
2024, Environmental Modelling and Software
The assessment of rainfall-triggered flood and landslide hazards is crucial in mitigating the potential damages to both human lives and property. In this study, we conducted a gridded risk assessment of flood and landslide hazards in Shaanxi Province, utilizing an ensemble simulation of a coupled model. The ensemble simulation intervals of discharges met the observed discharges well at the four hydrological stations, and the global accuracies of landslide modeled results were mostly greater than 0.9. The flood hazard rating was low in most areas while the landslide hazard rating was high in south. The high integrated vulnerability area was mainly concentrated in the central region. Flood risk was low in almost the entire area, while landslide risk was high in south and low in the central and northern regions. The methods proposed in this study provide a new perspective on regional risk assessment.
Metamodelling of the load-displacement response of offshore piles in sand
2023, Computers and Geotechnics
The paper illustrates the development of metamodels of the response of steel piles driven in sand and subjected to pull-out. The metamodels are created for the prediction of the pile tensile capacity and secant stiffness. They were developed using the results of finite element analyses, which made use of finite element models of robustness assessed employing a selection of available data from large-scale model pile tests. Four hundred finite element analyses allowed for the calibration of very accurate metamodels, which were also demonstrated to closely track the outputs of the experimental results. Once calibrated, the metamodels can be used independently from the finite element models they stemmed from. The outcomes of the study show that metamodels of piles response can yield very accurate results within a wide and realistic range of soil-pile configuration, avoiding the laborious implementation and computational cost which underpins the use of finite element models. As the use of metamodels in this context is new, the paper relies on particularly simplified problem, but the procedure could be extended to accommodate modelling features of higher complexity.
Integrated risk assessment and decision support for water-related disasters
2023, Hydro-Meteorological Hazards, Risks, and Disasters
This chapter updates the conceptual KULTURisk framework and its implementation methods (SERRA or Socio-Economic Regional Risk Assessment) for integrated (physical and economical) risk assessment and evaluation of risk prevention benefits in the field of water-related processes. The framework (i.e., named after the European project within which it originated) and the SERRA approach were developed upon preexisting methods, with four main innovation aims: (1) to include the social capacities of reducing vulnerability and risk, (2) to operationalize the assessment of exposed assets and the benefits of risk reduction measures by including a monetary estimation of costs and benefits, (3) to estimate intangible and indirect costs, and (4) to improve the ability to track uncertainty in estimated values. We build on the well-established Hazard-Vulnerability-Exposure framework, but vulnerability is expanded to consider the interactions between physical (territorial) characteristics, susceptibility, and capacities of socioeconomic systems to adapt and cope with specific hazards, and it is here formulated as a nondimensional index ranging between 0 and 1. Exposure is instead assessed in monetary terms, and thus the multiplicative combination of two indices ranging between 0 and 1 (hazard and vulnerability) with a third one (exposure) expressed in monetary terms produces a monetary quantification of risk, which can be used for supporting decisions via cost–benefit analysis. Operational solutions are proposed to evaluate four possible socioeconomic costs deriving from the adverse consequences of floods, namely direct/indirect and tangible/intangible costs. The proposed methodology aims to be comprehensive concerning the set of receptors usually considered in the literature of regional risk assessment. The sets of receptors considered are people, economic activities, categorized as (1) buildings; (2) infrastructures; and (3) agriculture and cultural heritage and ecosystems. By applying the framework to the eastern part of Dhaka city, Bangladesh, we illustrate how SERRA can be implemented to support decision-makers identifying robust risk management solutions in a highly uncertain context, by simulating key climate and socioeconomic variables and their uncertainty, and by utilizing data mining to extract useful information for decision-makers. Results are summarized and communicated using decision trees that describe a categorized view of the vulnerabilities of the proposed risk reduction measures, by identifying the states and combinations of key variables that could determine considerable failures.
A labelling strategy to define airtightness performance ranges of naturally ventilated dwellings: An application in southern Europe
2022, Energy and Buildings
Citation Excerpt :
The LHS method generated 300 dwellings, corresponding to different combinations of the continuous variables. This method allows inputting a correlation matrix alongside the original data to preserve the existing significant correlations in the output sample [57]. The Spearman correlation coefficient addressed the correlation between the variables.
Energy efficiency and indoor air quality are frequently-two conflicting objectives when establishing the air change rate (ACH) of a dwelling. In Europe, the northern countries have a clear focus on energy conservation, leading to an obvious awareness of the importance of airtightness, which translates into a high level of regulation and implementation. Meanwhile, the southern counterparts experience a more complex challenge by having predominantly passive ventilation strategies and milder climates, which often results in a more permissive approach.
This work proposes an innovative labelling methodology to classify the performance of naturally ventilated dwellings. A representative sample of a southern European national built stock is used in a stochastic process to create a pool of 43,200 unique dwellings. The simulation period refers to a month of the typical heating season in the southern European mild conditions. The results test the labelling methodology. With feature selection, ACH limits, and a labelling strategy, dwellings classify according to their ability to provide adequate ACHs.
The terrain was the best splitter of the dataset from the applied categorical variables. Regarding continuous variables, the airtightness was the one explaining most of the variability of the outputted ACHs, followed by the floor area. From the best performing dwellings labelled as compliant (Com), the average airtightness level was 5.3 h⁻¹, with 4.9 h⁻¹ and 5.8 h⁻¹ in rural and urban locations.
On the usefulness of gradient information in surrogate modeling: Application to uncertainty propagation in composite material models
2020, Probabilistic Engineering Mechanics
Citation Excerpt :
The current work focuses on sequential space filling designs, which are natural candidates for systematic surrogate modeling error convergence studies. These include Hierarchical Latin Hypercube Sampling (HLHS) [19,20], Refined Latinized Stratified Sampling (RLSS) [21] and Scrambled Sobol (ScSo) [22] sequences. There are a few gradient-informed surrogate algorithms based on unstructured sampling points, namely gradient-enhanced kriging [23], gradient-assisted radial basis function [10,11] and gradient-enhanced radial basis function [24] among others, which have been used for efficient optimization in computational fluid dynamics (CFD) applications.
In this work, the performance of non-gradient as well as gradient-enhanced versions of two different classes of surrogate modeling approaches, polynomial least squares regression and kernel based radial basis function interpolation, are compared in the context of a composite mechanics problem. Sequential space filling random designs are used for selecting the training points. The primary goal is to investigate whether additional gradient information obtained at a relatively small cost helps in generating surrogates of better quality compared to those obtained without any gradient information. It is found from the study that if the gradient and/or function evaluations are noisy, then the quality of the surrogate approximation is similar for both the gradient enhanced and the non-gradient based surrogate models. However, if the gradient and function evaluations are accurate, the gradient-enhanced surrogate models consistently perform better than the non-gradient based surrogate models, indicating that the gradient information enhances the quality of the surrogates. Low dimensional analytical test functions are used to demonstrate this behavior. As an application problem, we consider a multi-fiber reinforced composite model with a different interfacial damage parameter assigned to each fiber $∕$ matrix interface. In particular, the surrogate describes the variation of the homogenized stress at a given input strain as a function of the interface damage parameters. The Interface-Enriched Generalized Finite Element Method (IGFEM) is used in this case to solve for the stress as well as the gradients of the stress with respect to the damage parameters. Thus the goal of this study is two-fold: (1) to compare the error convergence properties in surrogate modeling using different sequential random space filled designs, with and without gradient information; (2) to identify the circumstances in which additional gradient information is beneficial for surrogate modeling.
Kriging based reliability and sensitivity analysis – Application to the stability of an earth dam
2020, Computers and Geotechnics
This article presents a Kriging-based probabilistic analysis of an earth dam. The dam failure probability with respect to the sliding stability is investigated by considering the influence of various factors: the filter drain length, the full reservoir water level location and the correlation between the input parameters. A procedure which combines the Kriging surrogate model with the Monte Carlo Simulation (MCS), the Global Sensitivity Analysis (GSA) and the First Order Reliability Method (FORM) is proposed. It aims at benefiting from the computational efficiency of a Kriging surrogate model to provide as much as possible results such as the failure probability, the sensitivity index of each input parameter and the design point. Having more useful results in a probabilistic analysis can help engineers to make more rational decisions. The proposed procedure is compared with the direct MCS, GSA and FORM, and shows a good accuracy and efficiency. In addition, two commonly used slope stability analysis methods (strength reduction method (SRM) and limit equilibrium method (LEM)) are compared in a probabilistic framework. The comparison shows that the two methods can lead to similar estimates of the failure probability for most cases, except when the pore water pressure is important for the determination of the critical slip surface. This kind of results can help engineers to judge when LEM is accurate enough and when SRM is required for a probabilistic analysis.

View all citing articles on Scopus

View full text

Published by Elsevier Ltd.

Extension of Latin hypercube samples with correlated variables

Abstract

Introduction

Section snippets

Definition of Latin hypercube sampling

Extension algorithm

Illustration of extension algorithm

Correlation

Discussion

Acknowledgments

Reliab Eng Syst Saf

Math Comput Simulat

J Comput Phys

Reliab Eng Syst Saf

Reliab Eng Syst Saf

Reliab Eng Syst Saf

Reliab Eng Syst Saf

Reliab Eng Syst Saf

Reliab Eng Syst Saf

Regul Toxicol Pharmacol

Reliab Eng Syst Saf

Reliab Eng Syst Saf

Prog Nucl Saf

Reliab Eng Syst Saf

Reliab Eng Syst Saf

Nucl Eng Des

Nucl Eng Des

Nucl Eng Des

Nucl Eng Des

Reliab Eng Syst Saf

Reliab Eng Syst Saf

Error analysis and simulations of complex phenomena

Los Alamos Sci

QMU and nuclear weapons certification: what's under the hood?

Los Alamos Sci

Science, uncertainty and risk: the problem of complex phenomena

APS News

Guiding principles for Monte Carlo analysis, EPA/630/R-97/001

Science and judgment in risk assessment

Issues in risk assessment

An SAB report: multi-media risk assessment for radon, review of uncertainty analysis of risks associated with exposure to radon, EPA-SAB-RAC-93-014

Water-quality modeling: a review of the analysis of uncertainty

Water Resourc Res

Sensitivity analysis of complex kinetic systems. Tools and applications

J Math Chem

Sensitivity analysis in chemical kinetics

Introduction to system sensitivity theory

General sensitivity theory

Response surface methodology: a retrospective and literature review

J Qual Technol

Response surface methodology—current status and future directions

J Qual Technol

Sampling methods and sensitivity analysis for large parameter sets

J Stat Comput Simulat

Sensitivity analysis and related analyses: a review of some statistical techniques

J Stat Comput Simulat

Design and analysis of computer experiments

Stat Sci

Response surface methodology

Math Sci

A review of response surface methodology from a biometric viewpoint

Biometrics

Response surface methodology