clusterRunSimulation {simFrame} | R Documentation |
Run a simulation experiment on a cluster
Description
Generic function for running a simulation experiment on a cluster.
Usage
clusterRunSimulation(cl, x, setup, nrep, control,
contControl = NULL, NAControl = NULL,
design = character(), fun, ...,
SAE = FALSE)
Arguments
cl |
a cluster as generated by |
x |
a |
setup |
an object of class |
nrep |
a non-negative integer giving the number of repetitions of the simulation experiment (for model-based simulation, mixed simulation designs or simulation based on real data). |
control |
a control object of class |
contControl |
an object of a class inheriting from
|
NAControl |
an object of a class inheriting from
|
design |
a character vector specifying variables (columns) to be used
for splitting the data into domains. The simulations, including
contamination and the insertion of missing values (unless |
fun |
a function to be applied in each simulation run. |
... |
for |
SAE |
a logical indicating whether small area estimation will be used in the simulation experiment. |
Details
Statistical simulation is embarrassingly parallel, hence computational
performance can be increased by parallel computing. Since version 0.5.0,
parallel computing in simFrame
is implemented using the package
parallel
, which is part of the R base distribution since version
2.14.0 and builds upon work done for the contributed packages
multicore
and snow
. Note that all objects and packages
required for the computations (including simFrame
) need to be made
available on every worker process unless the worker processes are created by
forking (see makeCluster
).
In order to prevent problems with random numbers and to ensure
reproducibility, random number streams should be used. With
parallel
, random number streams can be created via the
function clusterSetRNGStream()
.
There are some requirements for slot fun
of the control object
control
. The function must return a numeric vector, or a list with
the two components values
(a numeric vector) and add
(additional results of any class, e.g., statistical models). Note that the
latter is computationally slightly more expensive. A data.frame
is
passed to fun
in every simulation run. The corresponding argument
must be called x
. If comparisons with the original data need to be
made, e.g., for evaluating the quality of imputation methods, the function
should have an argument called orig
. If different domains are used
in the simulation, the indices of the current domain can be passed to the
function via an argument called domain
.
For small area estimation, the following points have to be kept in mind. The
slot design
of control
for splitting the data must be supplied
and the slot SAE
must be set to TRUE
. However, the data are
not actually split into the specified domains. Instead, the whole data set
(sample) is passed to fun
. Also contamination and missing values are
added to the whole data (sample). Last, but not least, the function must
have a domain
argument so that the current domain can be extracted
from the whole data (sample).
In every simulation run, fun
is evaluated using try
. Hence
no results are lost if computations fail in any of the simulation runs.
Value
An object of class "SimResults"
.
Methods
cl = "ANY", x = "ANY", setup = "ANY", nrep = "ANY", control = "missing"
convenience wrapper that allows the slots of
control
to be supplied as argumentscl = "ANY", x = "data.frame", setup = "missing", nrep = "numeric", control = "SimControl"
run a simulation experiment based on real data with repetitions on a cluster.
cl = "ANY", x = "data.frame", setup = "SampleSetup", nrep = "missing", control = "SimControl"
run a design-based simulation experiment with previously set up samples on a cluster.
cl = "ANY", x = "data.frame", setup = "VirtualSampleControl", nrep = "missing", control = "SimControl"
run a design-based simulation experiment on a cluster.
cl = "ANY", x = "VirtualDataControl", setup = "missing", nrep = "numeric", control = "SimControl"
run a model-based simulation experiment with repetitions on a cluster.
cl = "ANY", x = "VirtualDataControl", setup = "VirtualSampleControl", nrep = "numeric", control = "SimControl"
run a simulation experiment using a mixed simulation design with repetitions on a cluster.
Author(s)
Andreas Alfons
References
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi: 10.18637/jss.v037.i03.
L'Ecuyer, P., Simard, R., Chen E and Kelton, W. (2002) An Object-Oriented Random-Number Package with Many Long Streams and Substreams. Operations Research, 50(6), 1073–1075.
Rossini, A., Tierney L. and Li, N. (2007) Simple Parallel Statistical Computing in R. Journal of Computational and Graphical Statistics, 16(2), 399–420.
Tierney, L., Rossini, A. and Li, N. (2009) snow
: A Parallel Computing
Framework for the R System. International Journal of Parallel
Programming, 37(1), 78–90.
See Also
makeCluster
,
clusterSetRNGStream
,
runSimulation
, "SimControl"
,
"SimResults"
, simBwplot
,
simDensityplot
, simXyplot
Examples
## Not run:
## these examples requires at least a dual core processor
## design-based simulation
data(eusilcP) #load data
# start cluster
cl <- makeCluster(2, type = "PSOCK")
# load package and data on workers
clusterEvalQ(cl, {
library(simFrame)
data(eusilcP)
})
# set up random number stream
clusterSetRNGStream(cl, iseed = "12345")
# control objects for sampling and contamination
sc <- SampleControl(size = 500, k = 50)
cc <- DARContControl(target = "eqIncome", epsilon = 0.02,
fun = function(x) x * 25)
# function for simulation runs
sim <- function(x) {
c(mean = mean(x$eqIncome), trimmed = mean(x$eqIncome, 0.02))
}
# export objects to workers
clusterExport(cl, c("sc", "cc", "sim"))
# run simulation on cluster
results <- clusterRunSimulation(cl, eusilcP,
sc, contControl = cc, fun = sim)
# stop cluster
stopCluster(cl)
# explore results
head(results)
aggregate(results)
tv <- mean(eusilcP$eqIncome) # true population mean
plot(results, true = tv)
## model-based simulation
# start cluster
cl <- makeCluster(2, type = "PSOCK")
# load package on workers
clusterEvalQ(cl, library(simFrame))
# set up random number stream
clusterSetRNGStream(cl, iseed = "12345")
# function for generating data
rgnorm <- function(n, means) {
group <- sample(1:2, n, replace=TRUE)
data.frame(group=group, value=rnorm(n) + means[group])
}
# control objects for data generation and contamination
means <- c(0, 0.25)
dc <- DataControl(size = 500, distribution = rgnorm,
dots = list(means = means))
cc <- DCARContControl(target = "value",
epsilon = 0.02, dots = list(mean = 15))
# function for simulation runs
sim <- function(x) {
c(mean = mean(x$value),
trimmed = mean(x$value, trim = 0.02),
median = median(x$value))
}
# export objects to workers
clusterExport(cl, c("rgnorm", "means", "dc", "cc", "sim"))
# run simulation on cluster
results <- clusterRunSimulation(cl, dc, nrep = 100,
contControl = cc, design = "group", fun = sim)
# stop cluster
stopCluster(cl)
# explore results
head(results)
aggregate(results)
plot(results, true = means)
## End(Not run)