SimSolve {SimDesign} | R Documentation |
One Dimensional Root (Zero) Finding in Simulation Experiments
Description
This function provides a stochastic root-finding approach to solving
specific quantities in simulation experiments (e.g., solving for a specific
sample size to meet a target power rate) using the
Probablistic Bisection Algorithm with Bolstering and Interpolations
(ProBABLI; Chalmers, accepted). The structure follows the
steps outlined in runSimulation
, however portions of
the design
input are taken as variables to be estimated rather than
fixed, and the constant b
is required in order to
solve the root equation f(x) - b = 0
. Stochastic root search is terminated
based on the successive behavior of the x
estimates.
For even greater advertised accuracy with ProBABLI, termination criteria
can be based on the width of the advertised predicting interval
(via predCI.tol
) or by specifying how long the investigator
is willing to wait for the final estimates (via wait.time
,
where longer wait times lead to progressively better accuracy in
the final estimates).
Usage
SimSolve(
design,
interval,
b,
generate,
analyse,
summarise,
replications = list(burnin.iter = 15L, burnin.reps = 100L, max.reps = 500L,
min.total.reps = 9000L, increase.by = 10L),
integer = TRUE,
formula = y ~ poly(x, 2),
family = "binomial",
parallel = FALSE,
cl = NULL,
save = TRUE,
resume = TRUE,
method = "ProBABLI",
wait.time = NULL,
ncores = parallel::detectCores() - 1L,
type = ifelse(.Platform$OS.type == "windows", "PSOCK", "FORK"),
maxiter = 100L,
check.interval = TRUE,
verbose = TRUE,
control = list(),
predCI = 0.95,
predCI.tol = NULL,
...
)
## S3 method for class 'SimSolve'
summary(object, tab.only = FALSE, reps.cutoff = 300, ...)
## S3 method for class 'SimSolve'
plot(x, y, ...)
Arguments
design |
a |
interval |
a vector of length two, or matrix with |
b |
a single constant used to solve the root equation |
generate |
generate function. See |
analyse |
analysis function. See |
summarise |
summary function that returns a single number corresponding
to a function evaluation |
replications |
a named list or vector indicating the number of replication to
use for each design condition per PBA iteration. By default the input is a
Vector inputs can specify the exact replications for each iterations. As a general rule, early iterations should be relatively low for initial searches to avoid unnecessary computations for locating the approximate root, though the number of replications should gradually increase to reduce the sampling variability as the PBA approaches the root. |
integer |
logical; should the values of the root be considered integer
or numeric? If |
formula |
regression formula to use when |
family |
Note that if individual results from the |
parallel |
for parallel computing for slower simulation experiments
(see |
cl |
see |
save |
logical; store temporary file in case of crashes. If detected
in the working directory will automatically be loaded to resume (see
|
resume |
logical; if a temporary |
method |
optimizer method to use. Default is the stochastic root-finder
|
wait.time |
(optional) argument passed to
|
ncores |
see |
type |
type of cluster object to define. If |
maxiter |
the maximum number of iterations (default 100) |
check.interval |
logical; should an initial check be made to determine
whether |
verbose |
logical; print information to the console? |
control |
a |
predCI |
advertised confidence interval probability for final
model-based prediction of target |
predCI.tol |
(optional) rather than relying on the changes between successive
estimates (default), if the predicting CI is consistently within this
supplied tolerance input range then terminate.
This provides termination behaviour based on the predicted
precision of the root solutions rather than their stability history, and therefore
can be used to obtain estimates with a particular level of advertised accuracy.
For example, when solving for a sample size value ( |
... |
additional arguments to be pasted to |
object |
object of class |
tab.only |
logical; print only the (reduce) table of estimates? |
reps.cutoff |
integer indicating the rows to omit from output if the number of replications do no reach this value |
x |
object of class |
y |
design row to plot. If omitted defaults to 1 |
Details
Root finding is performed using a progressively bolstered version of the
probabilistic bisection algorithm (PBA
) to find the
associated root given the noisy simulation
objective function evaluations. Information is collected throughout
the search to make more accurate predictions about the
associated root via interpolation. If interpolations fail, then the last
iteration of the PBA search is returned as the best guess.
Value
the filled-in design
object containing the associated lower and upper interval
estimates from the stochastic optimization
Author(s)
Phil Chalmers rphilip.chalmers@gmail.com
References
Chalmers, R. P. (accepted). Solving Variables with Monte Carlo Simulation Experiments: A
Stochastic Root-Solving Approach. Psychological Methods
.
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulations
with the SimDesign Package. The Quantitative Methods for Psychology, 16
(4), 248-280.
doi:10.20982/tqmp.16.4.p248
See Also
Examples
## Not run:
##########################
## A Priori Power Analysis
##########################
# GOAL: Find specific sample size in each group for independent t-test
# corresponding to a power rate of .8
#
# For ease of the setup, assume the groups are the same size, and the mean
# difference corresponds to Cohen's d values of .2, .5, and .8
# This example can be solved numerically using the pwr package (see below),
# though the following simulation setup is far more general and can be
# used for any generate-analyse combination of interest
# SimFunctions(SimSolve=TRUE)
#### Step 1 --- Define your conditions under study and create design data.frame.
#### However, use NA placeholder for sample size as it must be solved,
#### and add desired power rate to object
Design <- createDesign(N = NA,
d = c(.2, .5, .8),
sig.level = .05)
Design # solve for NA's
#~~~~~~~~~~~~~~~~~~~~~~~~
#### Step 2 --- Define generate, analyse, and summarise functions
Generate <- function(condition, fixed_objects) {
Attach(condition)
group1 <- rnorm(N)
group2 <- rnorm(N, mean=d)
dat <- data.frame(group = gl(2, N, labels=c('G1', 'G2')),
DV = c(group1, group2))
dat
}
Analyse <- function(condition, dat, fixed_objects) {
p <- t.test(DV ~ group, dat, var.equal=TRUE)$p.value
p
}
Summarise <- function(condition, results, fixed_objects) {
# Must return a single number corresponding to f(x) in the
# root equation f(x) = b
ret <- c(power = EDR(results, alpha = condition$sig.level))
ret
}
#~~~~~~~~~~~~~~~~~~~~~~~~
#### Step 3 --- Optimize N over the rows in design
### (For debugging) may want to see if simulation code works as intended first
### for some given set of inputs
# runSimulation(design=createDesign(N=100, d=.8, sig.level=.05),
# replications=10, generate=Generate, analyse=Analyse,
# summarise=Summarise)
# Initial search between N = [10,500] for each row using the default
# integer solver (integer = TRUE). In this example, b = target power
solved <- SimSolve(design=Design, b=.8, interval=c(10, 500),
generate=Generate, analyse=Analyse,
summarise=Summarise)
solved
summary(solved)
plot(solved, 1)
plot(solved, 2)
plot(solved, 3)
# also can plot median history and estimate precision
plot(solved, 1, type = 'history')
plot(solved, 1, type = 'density')
plot(solved, 1, type = 'iterations')
# verify with true power from pwr package
library(pwr)
pwr.t.test(d=.2, power = .8) # sig.level/alpha = .05 by default
pwr.t.test(d=.5, power = .8)
pwr.t.test(d=.8, power = .8)
# use estimated N results to see how close power was
N <- solved$N
pwr.t.test(d=.2, n=N[1])
pwr.t.test(d=.5, n=N[2])
pwr.t.test(d=.8, n=N[3])
# with rounding
N <- ceiling(solved$N)
pwr.t.test(d=.2, n=N[1])
pwr.t.test(d=.5, n=N[2])
pwr.t.test(d=.8, n=N[3])
### failing analytic formula, confirm results with more precise
### simulation via runSimulation()
### (not required, if accuracy is important then ProBABLI should be run longer)
# csolved <- solved
# csolved$N <- ceiling(solved$N)
# confirm <- runSimulation(design=csolved, replications=10000, parallel=TRUE,
# generate=Generate, analyse=Analyse,
# summarise=Summarise)
# confirm
# Similarly, terminate if the prediction interval is consistently predicted
# to be between [.795, .805]. Note that maxiter increased as well
solved_predCI <- SimSolve(design=Design, b=.8, interval=c(10, 500),
generate=Generate, analyse=Analyse, summarise=Summarise,
maxiter=200, predCI.tol=.01)
solved_predCI
summary(solved_predCI) # note that predCI.b are all within [.795, .805]
N <- solved_predCI$N
pwr.t.test(d=.2, n=N[1])
pwr.t.test(d=.5, n=N[2])
pwr.t.test(d=.8, n=N[3])
# Alternatively, and often more realistically, wait.time can be used
# to specify how long the user is willing to wait for a final estimate.
# Solutions involving more iterations will be more accurate,
# and therefore it is recommended to run the ProBABLI root-solver as long
# the analyst can tolerate if the most accurate estimates are desired.
# Below executes the simulation for 5 minutes for each condition up
# to a maximum of 1000 iterations, terminating based on whichever occurs first
solved_5min <- SimSolve(design=Design, b=.8, interval=c(10, 500),
generate=Generate, analyse=Analyse, summarise=Summarise,
wait.time="5", maxiter=1000)
solved_5min
summary(solved_5min)
# use estimated N results to see how close power was
N <- solved_5min$N
pwr.t.test(d=.2, n=N[1])
pwr.t.test(d=.5, n=N[2])
pwr.t.test(d=.8, n=N[3])
#------------------------------------------------
#######################
## Sensitivity Analysis
#######################
# GOAL: solve effect size d given sample size and power inputs (inputs
# for root no longer required to be an integer)
# Generate-Analyse-Summarise functions identical to above, however
# Design input includes NA for d element
Design <- createDesign(N = c(100, 50, 25),
d = NA,
sig.level = .05)
Design # solve for NA's
#~~~~~~~~~~~~~~~~~~~~~~~~
#### Step 2 --- Define generate, analyse, and summarise functions (same as above)
#~~~~~~~~~~~~~~~~~~~~~~~~
#### Step 3 --- Optimize d over the rows in design
# search between d = [.1, 2] for each row
# In this example, b = target power
# note that integer = FALSE to allow smooth updates of d
solved <- SimSolve(design=Design, b = .8, interval=c(.1, 2),
generate=Generate, analyse=Analyse,
summarise=Summarise, integer=FALSE)
solved
summary(solved)
plot(solved, 1)
plot(solved, 2)
plot(solved, 3)
# plot median history and estimate precision
plot(solved, 1, type = 'history')
plot(solved, 1, type = 'density')
plot(solved, 1, type = 'iterations')
# verify with true power from pwr package
library(pwr)
pwr.t.test(n=100, power = .8)
pwr.t.test(n=50, power = .8)
pwr.t.test(n=25, power = .8)
# use estimated d results to see how close power was
pwr.t.test(n=100, d = solved$d[1])
pwr.t.test(n=50, d = solved$d[2])
pwr.t.test(n=25, d = solved$d[3])
### failing analytic formula, confirm results with more precise
### simulation via runSimulation() (not required; if accuracy is important
### PROBABLI should just be run longer)
# confirm <- runSimulation(design=solved, replications=10000, parallel=TRUE,
# generate=Generate, analyse=Analyse,
# summarise=Summarise)
# confirm
#------------------------------------------------
#####################
## Criterion Analysis
#####################
# GOAL: solve Type I error rate (alpha) given sample size, effect size, and
# power inputs (inputs for root no longer required to be an integer). Only useful
# when Type I error is less important than achieving the desired 1-beta (power)
Design <- createDesign(N = 50,
d = c(.2, .5, .8),
sig.level = NA)
Design # solve for NA's
# all other function definitions same as above
# search for alpha within [.0001, .8]
solved <- SimSolve(design=Design, b = .8, interval=c(.0001, .8),
generate=Generate, analyse=Analyse,
summarise=Summarise, integer=FALSE)
solved
summary(solved)
plot(solved, 1)
plot(solved, 2)
plot(solved, 3)
# plot median history and estimate precision
plot(solved, 1, type = 'history')
plot(solved, 1, type = 'density')
plot(solved, 1, type = 'iterations')
# verify with true power from pwr package
library(pwr)
pwr.t.test(n=50, power = .8, d = .2, sig.level=NULL)
pwr.t.test(n=50, power = .8, d = .5, sig.level=NULL)
pwr.t.test(n=50, power = .8, d = .8, sig.level=NULL)
# use estimated alpha results to see how close power was
pwr.t.test(n=50, d = .2, sig.level=solved$sig.level[1])
pwr.t.test(n=50, d = .5, sig.level=solved$sig.level[2])
pwr.t.test(n=50, d = .8, sig.level=solved$sig.level[3])
### failing analytic formula, confirm results with more precise
### simulation via runSimulation() (not required; if accuracy is important
### PROBABLI should just be run longer)
# confirm <- runSimulation(design=solved, replications=10000, parallel=TRUE,
# generate=Generate, analyse=Analyse,
# summarise=Summarise)
# confirm
## End(Not run)