SimCollect {SimDesign} | R Documentation |
Collapse separate simulation files into a single result
Description
This function collects and aggregates the results from
SimDesign
's runSimulation
into a single
objects suitable for post-analyses, or combines all the saved results directories and combines
them into one. This is useful when results are run piece-wise on one node (e.g., 500 replications
in one batch, 500 again at a later date, though be careful about the set.seed
use as the random numbers will tend to correlate the more it is used) or run independently across different
nodes/computing cores (e.g., see runArraySimulation
.
Usage
SimCollect(
files = NULL,
filename = NULL,
dirs = NULL,
results_dirname = "SimDesign_aggregate_results",
select = NULL,
check.only = FALSE,
target.reps = NULL
)
aggregate_simulations(...)
Arguments
files |
a |
filename |
(optional) name of .rds file to save aggregate simulation file to. If not specified then the results will only be returned in the R console |
dirs |
a |
results_dirname |
the new directory to place the aggregated results files |
select |
a character vector indicating columns to variables to select from the
|
check.only |
logical; for larger simulations file sets, such as those generated by
|
target.reps |
(optional) number of replications to check against to evaluate whether the simulation files returned the desired number of replications. If missing, the highest detected value from the collected set of replication information will be used |
... |
not used |
Value
if files
is used the function returns a data.frame/tibble
with the (weighted) average
of the simulation results. Otherwise, if dirs
is used, the function returns NULL
Author(s)
Phil Chalmers rphilip.chalmers@gmail.com
References
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulations
with the SimDesign Package. The Quantitative Methods for Psychology, 16
(4), 248-280.
doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with Monte
Carlo simulation. Journal of Statistics Education, 24
(3), 136-156.
doi:10.1080/10691898.2016.1246953
See Also
Examples
## Not run:
setwd('my_working_directory')
## run simulations to save the .rds files (or move them to the working directory)
# seeds1 <- genSeeds(design)
# seeds2 <- genSeeds(design, old.seeds=seeds1)
# ret1 <- runSimulation(design, ..., seed=seeds1, filename='file1')
# ret2 <- runSimulation(design, ..., seed=seeds2, filename='file2')
# saves to the hard-drive and stores in workspace
final <- SimCollect(files = c('file1.rds', 'file2.rds'))
final
# If filename not included, can be extracted from results
# files <- c(SimExtract(ret1, 'filename'), SimExtract(ret2, 'filename'))
# final <- SimCollect(files = files)
# aggregate saved results for .rds files and results directories
# runSimulation(..., seed=seeds1, save_results = TRUE,
# save_details = list(save_results_dirname = 'dir1'))
# runSimulation(..., seed=seeds2, save_results = TRUE,
# save_details = list(save_results_dirname = 'dir2'))
# place new saved results in 'SimDesign_results/' by default
SimCollect(files = c('file1.rds', 'file2.rds'),
filename='aggreged_sim.rds',
dirs = c('dir1', 'dir2'))
# If dirnames not included, can be extracted from results
# dirs <- c(SimExtract(ret1, 'save_results_dirname'),
SimExtract(ret2, 'save_results_dirname'))
# SimCollect(dirs = dirs)
#################################################
# Example where each row condition is repeated, evaluated independently,
# and later collapsed into a single analysis object
# Each condition repeated four times (hence, replications
# should be set to desired.reps/4)
Design <- createDesign(N = c(30, 60),
mu = c(0,5))
Design
Design4 <- expandDesign(Design, 4)
Design4
#-------------------------------------------------------------------
Generate <- function(condition, fixed_objects) {
dat <- with(condition, rnorm(N, mean=mu))
dat
}
Analyse <- function(condition, dat, fixed_objects) {
ret <- c(mean=mean(dat), SD=sd(dat))
ret
}
Summarise <- function(condition, results, fixed_objects) {
ret <- colMeans(results)
ret
}
#-------------------------------------------------------------------
# Generate fixed seeds to be distributed
set.seed(1234)
seeds <- genSeeds(Design)
seeds
# replications vector (constant is fine if the same across conditions;
# below is vectorized to demonstrate that this could change)
replications <- rep(250, nrow(Design))
# create directory to store all final simulation files
dir.create('sim_files/')
# distribute jobs independently (explicitly parallelize here on cluster,
# which is more elegantly managed via runArraySimulation)
sapply(1:nrow(Design), \(i) {
runSimulation(design=Design[i, ], replications=replications[i],
generate=Generate, analyse=Analyse, summarise=Summarise,
filename=paste0('sim_files/job-', i)) |> invisible()
})
# check that all replications satisfy target
files <- paste0('sim_files/job-', 1:nrow(Design), ".rds")
SimCollect(files = files, check.only = TRUE)
# this would have been returned were the target.rep supposed to be 1000
SimCollect(files = files, check.only = TRUE, target.reps=1000)
# aggregate into single object
sim <- SimCollect(files = paste0('sim_files/job-', 1:nrow(Design), ".rds"))
sim
## End(Not run)