find.design {mlpwr}R Documentation

Find optimal study designs

Description

Perform a surrogate modeling approach to search for optimal study design parameters. For further guidance on how to use the package and the find.design function specifically, see the Readme.md file.

Usage

find.design(
  simfun,
  boundaries,
  power = NULL,
  evaluations = 4000,
  ci = NULL,
  ci_perc = 0.95,
  time = NULL,
  costfun = NULL,
  cost = NULL,
  surrogate = NULL,
  n.startsets = 4,
  init.perc = 0.2,
  setsize = NULL,
  continue = NULL,
  dat = NULL,
  silent = FALSE,
  autosave_dir = NULL,
  control = list(),
  goodvals = "high",
  aggregate_fun = mean,
  noise_fun = "bernoulli",
  integer = TRUE,
  use_noise = TRUE
)

Arguments

simfun

function to generate hypothesis test results with. Takes design parameters as input and outputs a logical (result of the hypothesis test). The function can take the designs through one argument as a vector or through multiple arguments. For example, function(x) where x is later used with x=c(n,k) for two design parameters n and k is valid. Also valid is a definition using function(n,k).

boundaries

list containing lower and upper bounds of the design space. The list should consist of named vectors, each containing the upper and lower bound for the respective design parameter dimensions. For one design parameter dimension, can also be a vector containing the upper and lower bounds.

power

numeric; desired statistical power

evaluations

integer; number of simfun evaluations to be performed before termination

ci

numeric; desired width of the confidence interval at the predicted value on termination.

ci_perc

numeric; specifying the desired confidence interval, e.g. 95% or 99%.

time

integer; seconds until termination

costfun

function that takes a vector of design parameters as input and outputs a cost, e.g. monetary costs. Necessary for simfuns with multiple input dimensions.

cost

numeric; cost threshold. Design parameter set with highest power is searched among sets that fulfill this cost threshold.

surrogate

character; which surrogate model should be used. The default is 'logreg' for one design parameter and 'gpr' for multiple design parameters. The current options are: 'gpr', 'svr', 'logreg', 'reg' for one-dimensional designs and 'gpr' and 'svr' for multi-dimensional designs.

n.startsets

integer; number of startsets used per dimension of simfun

init.perc

numeric; percentage of evaluations used for the initialization phase

setsize

The number of draws from the simfun in each iteration

continue

Object of class designresult as created by the find.design function. Will be used to continue the search, using all collected simulation results so far.

dat

list of data from a previous design result.

silent

logical; suppresses output during the search.

autosave_dir

character; file location for saving the dat object after each update.

control

list specifying arguments passed to the surrogate models. For example, list(covtype='gauss') can be used with the gpr surrogate to use a different covariance structure than the default.

goodvals

character indicating whether higher or lower criterion values are preferable given equal cost; the default is "high" for statistical power, the other option is "low".

aggregate_fun

function to aggregate results of the evaluations of the simulation function; the default is mean, as for statistical power.

noise_fun

function to calculate the noise or variance of the aggregated results of the Monte Carlo evaluations; can also be the character value "bernoulli" (default) to indicate the variance of the Bernoulli distribution used for statistical power. This function is p(1-p)/n, where p is the statistical power and n is the number of performed evaluations.

integer

logical indicating whether the design parameters are integers or not; the default is TRUE, which is suitable for sample size, for example.

use_noise

logical indicating whether noise variance should be used; the default is TRUE.

Value

function returns an object of class designresult

Examples


## T-test example:

# Load a simulation function
simfun <- example.simfun('ttest')
# Perform the search
ds <- find.design(simfun = simfun, boundaries = c(100,300), power = .95)
# Output the results
summary(ds)
# Plot results
plot(ds)

## Two-dimensional simulation function:

simfun <- example.simfun('anova')
# Perform the search
ds <- find.design(simfun = simfun,
 costfun = function(n,n.groups) 5*n+20*n.groups,
 boundaries = list(n = c(10, 150), n.groups = c(5, 30)),
 power = .95)
# Output the results
summary(ds)
# Plot results
plot(ds)


##  Mixed model example with a custom, two-dimensional simulation function:

library(lme4)
library(lmerTest)

# Simulation function
simfun_multilevel <- function(n.per.school,n.schools) {

  # generate data
  group = rep(1:n.schools,each=n.per.school)
  pred = factor(rep(c("old","new"),n.per.school*n.schools),levels=c("old","new"))
  dat = data.frame(group = group, pred = pred)

  params <- list(theta = c(.5,0,.5), beta = c(0,1),sigma = 1.5)
  names(params$theta) = c("group.(Intercept)","group.prednew.(Intercept)","group.prednew")
  names(params$beta) = c("(Intercept)","prednew")
  dat$y <- simulate.formula(~pred + (1 + pred | group), newdata = dat, newparams = params)[[1]]

  # test hypothesis
  mod <- lmer(y ~ pred + (1 + pred | group), data = dat)
  pvalue <- summary(mod)[["coefficients"]][2,"Pr(>|t|)"]
  pvalue < .01
}
# Cost function
costfun_multilevel <- function(n.per.school, n.schools) {
  100 * n.per.school + 200 * n.schools
}
# Perform the search, can take a few minutes to run
ds <- find.design(simfun = simfun_multilevel, costfun = costfun_multilevel,
boundaries = list(n.per.school = c(5, 25), n.schools = c(10, 30)), power = .95,
evaluations = 1000)
# Output the results
summary(ds)
# Plot results
plot(ds)



[Package mlpwr version 1.1.0 Index]