simulateVector {EnvStats} | R Documentation |
Simulate a Vector of Random Numbers From a Specified Theoretical or Empirical Probability Distribution
Description
Simulate a vector of random numbers from a specified theoretical probability distribution or empirical probability distribution, using either Latin Hypercube sampling or simple random sampling.
Usage
simulateVector(n, distribution = "norm", param.list = list(mean = 0, sd = 1),
sample.method = "SRS", seed = NULL, sorted = FALSE,
left.tail.cutoff = ifelse(is.finite(supp.min), 0, .Machine$double.eps),
right.tail.cutoff = ifelse(is.finite(supp.max), 0, .Machine$double.eps))
Arguments
n |
a positive integer indicating the number of random numbers to generate. |
distribution |
a character string denoting the distribution abbreviation. The default value is
Alternatively, the character string |
param.list |
a list with values for the parameters of the distribution.
The default value is Alternatively, if you specify an empirical distribution by setting |
sample.method |
a character string indicating whether to use simple random sampling |
seed |
integer to supply to the R function |
sorted |
logical scalar indicating whether to return the random numbers in sorted
(ascending) order. The default value is |
left.tail.cutoff |
a scalar between 0 and 1 indicating what proportion of the left-tail of
the probability distribution to omit for Latin Hypercube sampling.
For densities with a finite support minimum (e.g., Lognormal or
Empirical) the default value is |
right.tail.cutoff |
a scalar between 0 and 1 indicating what proportion of the right-tail of
the probability distribution to omit for Latin Hypercube sampling.
For densities with a finite support maximum (e.g., Beta or
Empirical) the default value is |
Details
Simple Random Sampling (sample.method="SRS"
)
When sample.method="SRS"
, the function simulateVector
simply
calls the function r
abb, where abb denotes the
abbreviation of the specified distribution (e.g., rlnorm
,
remp
, etc.).
Latin Hypercube Sampling (sample.method="LHS"
)
When sample.method="LHS"
, the function simulateVector
generates
n
random numbers using Latin Hypercube sampling. The distribution is
divided into n
intervals of equal probability 1/n
and simple random
sampling is performed once within each interval; i.e., Latin Hypercube sampling
is simply stratified sampling without replacement, where the strata are defined
by the 0'th, 100(1/n)'th, 100(2/n)'th, ..., and 100'th percentiles of the
distribution.
Latin Hypercube sampling, sometimes abbreviated LHS,
is a method of sampling from a probability distribution that ensures all
portions of the probability distribution are represented in the sample.
It was introduced in the published literature by McKay et al. (1979) to overcome
the following problem in Monte Carlo simulation based on simple random sampling
(SRS). Suppose we want to generate random numbers from a specified distribution.
If we use simple random sampling, there is a low probability of getting very many
observations in an area of low probability of the distribution. For example, if
we generate n
observations from the distribution, the probability that none
of these observations falls into the upper 98'th percentile of the distribution
is 0.98^n
. So, for example, there is a 13% chance that out of 100
random numbers, none will fall at or above the 98'th percentile. If we are
interested in reproducing the shape of the distribution, we will need a very large
number of observations to ensure that we can adequately characterize the tails of
the distribution (Vose, 2008, pp. 59–62).
See Millard (2013) for a visual explanation of Latin Hypercube sampling.
Value
a numeric vector of random numbers from the specified distribution.
Note
Latin Hypercube sampling, sometimes abbreviated LHS, is a method of sampling from a probability distribution that ensures all portions of the probability distribution are represented in the sample. It was introduced in the published literature by McKay et al. (1979). Latin Hypercube sampling is often used in probabilistic risk assessment, specifically for sensitivity and uncertainty analysis (e.g., Iman and Conover, 1980; Iman and Helton, 1988; Iman and Helton, 1991; Vose, 1996).
Author(s)
Steven P. Millard (EnvStats@ProbStatInfo.com)
References
Iman, R.L., and W.J. Conover. (1980). Small Sample Sensitivity Analysis Techniques for Computer Models, With an Application to Risk Assessment (with Comments). Communications in Statistics–Volume A, Theory and Methods, 9(17), 1749–1874.
Iman, R.L., and J.C. Helton. (1988). An Investigation of Uncertainty and Sensitivity Analysis Techniques for Computer Models. Risk Analysis 8(1), 71–90.
Iman, R.L. and J.C. Helton. (1991). The Repeatability of Uncertainty and Sensitivity Analyses for Complex Probabilistic Risk Assessments. Risk Analysis 11(4), 591–606.
McKay, M.D., R.J. Beckman., and W.J. Conover. (1979). A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output From a Computer Code. Technometrics 21(2), 239–245.
Millard, S.P. (2013). EnvStats: an R Package for Environmental Statistics. Springer, New York. https://link.springer.com/book/10.1007/978-1-4614-8456-1.
Vose, D. (2008). Risk Analysis: A Quantitative Guide. Third Edition. John Wiley & Sons, West Sussex, UK, 752 pp.
See Also
Probability Distributions and Random Numbers, Empirical,
simulateMvMatrix
, set.seed
.
Examples
# Generate 10 observations from a lognormal distribution with
# parameters mean=10 and cv=1 using simple random sampling:
simulateVector(10, distribution = "lnormAlt",
param.list = list(mean = 10, cv = 1), seed = 47,
sort = TRUE)
# [1] 2.086931 2.863589 3.112866 5.592502 5.732602 7.160707
# [7] 7.741327 8.251306 12.782493 37.214748
#----------
# Repeat the above example by calling rlnormAlt directly:
set.seed(47)
sort(rlnormAlt(10, mean = 10, cv = 1))
# [1] 2.086931 2.863589 3.112866 5.592502 5.732602 7.160707
# [7] 7.741327 8.251306 12.782493 37.214748
#----------
# Now generate 10 observations from the same lognormal distribution
# but use Latin Hypercube sampling. Note that the largest value
# is larger than for simple random sampling:
simulateVector(10, distribution = "lnormAlt",
param.list = list(mean = 10, cv = 1), seed = 47,
sample.method = "LHS", sort = TRUE)
# [1] 2.406149 2.848428 4.311175 5.510171 6.467852 8.174608
# [7] 9.506874 12.298185 17.022151 53.552699
#==========
# Generate 50 observations from a Pareto distribution with parameters
# location=10 and shape=2, then use this resulting vector of
# observations as the basis for generating 3 observations from an
# empirical distribution using Latin Hypercube sampling:
set.seed(321)
pareto.rns <- rpareto(50, location = 10, shape = 2)
simulateVector(3, distribution = "emp",
param.list = list(obs = pareto.rns), sample.method = "LHS")
#[1] 11.50685 13.50962 17.47335
#==========
# Clean up
#---------
rm(pareto.rns)