R: Find optimal study designs

find.design {mlpwr}

R Documentation

Find optimal study designs

Description

Perform a surrogate modeling approach to search for optimal study design parameters. For further guidance on how to use the package and the find.design function specifically, see the Readme.md file.

Usage

find.design(
  simfun,
  boundaries,
  power = NULL,
  evaluations = 4000,
  ci = NULL,
  ci_perc = 0.95,
  time = NULL,
  costfun = NULL,
  cost = NULL,
  surrogate = NULL,
  n.startsets = 4,
  init.perc = 0.2,
  setsize = NULL,
  continue = NULL,
  dat = NULL,
  silent = FALSE,
  autosave_dir = NULL,
  control = list(),
  goodvals = "high",
  aggregate_fun = mean,
  noise_fun = "bernoulli",
  integer = TRUE,
  use_noise = TRUE
)

Arguments

`simfun`	function to generate hypothesis test results with. Takes design parameters as input and outputs a logical (result of the hypothesis test). The function can take the designs through one argument as a vector or through multiple arguments. For example, function(x) where x is later used with x=c(n,k) for two design parameters n and k is valid. Also valid is a definition using function(n,k).
`boundaries`	list containing lower and upper bounds of the design space. The list should consist of named vectors, each containing the upper and lower bound for the respective design parameter dimensions. For one design parameter dimension, can also be a vector containing the upper and lower bounds.
`power`	numeric; desired statistical power
`evaluations`	integer; number of simfun evaluations to be performed before termination
`ci`	numeric; desired width of the confidence interval at the predicted value on termination.
`ci_perc`	numeric; specifying the desired confidence interval, e.g. 95% or 99%.
`time`	integer; seconds until termination
`costfun`	function that takes a vector of design parameters as input and outputs a cost, e.g. monetary costs. Necessary for simfuns with multiple input dimensions.
`cost`	numeric; cost threshold. Design parameter set with highest power is searched among sets that fulfill this cost threshold.
`surrogate`	character; which surrogate model should be used. The default is 'logreg' for one design parameter and 'gpr' for multiple design parameters. The current options are: 'gpr', 'svr', 'logreg', 'reg' for one-dimensional designs and 'gpr' and 'svr' for multi-dimensional designs.
`n.startsets`	integer; number of startsets used per dimension of simfun
`init.perc`	numeric; percentage of evaluations used for the initialization phase
`setsize`	The number of draws from the simfun in each iteration
`continue`	Object of class designresult as created by the find.design function. Will be used to continue the search, using all collected simulation results so far.
`dat`	list of data from a previous design result.
`silent`	logical; suppresses output during the search.
`autosave_dir`	character; file location for saving the dat object after each update.
`control`	list specifying arguments passed to the surrogate models. For example, list(covtype='gauss') can be used with the gpr surrogate to use a different covariance structure than the default.
`goodvals`	character indicating whether higher or lower criterion values are preferable given equal cost; the default is "high" for statistical power, the other option is "low".
`aggregate_fun`	function to aggregate results of the evaluations of the simulation function; the default is `mean`, as for statistical power.
`noise_fun`	function to calculate the noise or variance of the aggregated results of the Monte Carlo evaluations; can also be the character value "bernoulli" (default) to indicate the variance of the Bernoulli distribution used for statistical power. This function is `p(1-p)/n`, where `p` is the statistical power and `n` is the number of performed evaluations.
`integer`	logical indicating whether the design parameters are integers or not; the default is `TRUE`, which is suitable for sample size, for example.
`use_noise`	logical indicating whether noise variance should be used; the default is `TRUE`.

Value

function returns an object of class designresult

Examples


## T-test example:

# Load a simulation function
simfun <- example.simfun('ttest')
# Perform the search
ds <- find.design(simfun = simfun, boundaries = c(100,300), power = .95)
# Output the results
summary(ds)
# Plot results
plot(ds)

## Two-dimensional simulation function:

simfun <- example.simfun('anova')
# Perform the search
ds <- find.design(simfun = simfun,
 costfun = function(n,n.groups) 5*n+20*n.groups,
 boundaries = list(n = c(10, 150), n.groups = c(5, 30)),
 power = .95)
# Output the results
summary(ds)
# Plot results
plot(ds)


##  Mixed model example with a custom, two-dimensional simulation function:

library(lme4)
library(lmerTest)

# Simulation function
simfun_multilevel <- function(n.per.school,n.schools) {

  # generate data
  group = rep(1:n.schools,each=n.per.school)
  pred = factor(rep(c("old","new"),n.per.school*n.schools),levels=c("old","new"))
  dat = data.frame(group = group, pred = pred)

  params <- list(theta = c(.5,0,.5), beta = c(0,1),sigma = 1.5)
  names(params$theta) = c("group.(Intercept)","group.prednew.(Intercept)","group.prednew")
  names(params$beta) = c("(Intercept)","prednew")
  dat$y <- simulate.formula(~pred + (1 + pred | group), newdata = dat, newparams = params)[[1]]

  # test hypothesis
  mod <- lmer(y ~ pred + (1 + pred | group), data = dat)
  pvalue <- summary(mod)[["coefficients"]][2,"Pr(>|t|)"]
  pvalue < .01
}
# Cost function
costfun_multilevel <- function(n.per.school, n.schools) {
  100 * n.per.school + 200 * n.schools
}
# Perform the search, can take a few minutes to run
ds <- find.design(simfun = simfun_multilevel, costfun = costfun_multilevel,
boundaries = list(n.per.school = c(5, 25), n.schools = c(10, 30)), power = .95,
evaluations = 1000)
# Output the results
summary(ds)
# Plot results
plot(ds)

[Package mlpwr version 1.1.0 Index]