specify.formula {simpr}R Documentation

Specify data-generating mechanisms

Description

Specify the data-generating mechanisms for the simulation using purrr-style lambda functions.

Usage

## S3 method for class 'formula'
specify(x = NULL, ..., .use_names = TRUE, .sep = "_")

Arguments

x

leave this argument blank (NULL); this argument is a placeholder and can be skipped.

...

named purrr-style formula functions used for generating simulation variables. x is not recommended as a name, since it is a formal argument and will be automatically assumed to be the first variable (a message will be displayed if x is used).

.use_names

Whether to use names generated by the lambda function (TRUE, the default), or to overwrite them with supplied names.

.sep

Specify the separator for auto-generating names. See Column naming.

Details

This is always the first command in the simulation process, to specify the actual simulated variables, which is then passed to define to define metaparameters and then to generate to generate the data.

The ... arguments use an efficient syntax to specify custom functions needed for generating a simulation, based on the purrr package. When producing one variable, one can provide an expression such as specify(a = ~ 3 + runif(10)); the expression is preceded by ~, the tilde operator, and can refer to previous arguments in specify or to metaparameters in define. This is called a lambda function.

Order matters: arguments are evaluated sequentially, so later argument can refer to an earlier one, e.g. specify(a = ~ rnorm(2), b = ~ a + rnorm(2)).

generate combines results together into a single tibble for each simulation, so all lambda functions should produce the same number of rows. However, a lambda function can produce multiple columns.

Value

A simpr_specify object which contains the functions needed to generate the simulation; to be passed to define for defining metaparameters or, if there are no metaparameters, directly to generate for generating the simulation.

Also useful is the fact that one can refer to variables in subsequent arguments. So, one could define another variable b that depends on a very simply, e.g. specify(a = ~ 3 + runif(10), b = ~ 2 * x).

Finally, one can also refer to metaparameters that are to be systematically varied in the simulation study. See define and the examples for more details.

Column naming

Because functions can produce different numbers of columns, there are several options for naming columns. If a provided lambda function produces a single column, the name given to the argument becomes the name of the column. If the lambda function already produces column names, then the output will use these names if .use_names = TRUE, the default. Otherwise, simpr uses the argument name as a base and auto-numbers the columns. For instance, if the argument a generates a two-column matrix and .sep = "_" (the default) the columns will be named a_1and a_2.

Custom names can also be directly provided by a double-sided formula. The left-hand side must use c or cbind, e.g. specify(c(a, b) ~ MASS::mvrnorm(5, c(0, 0), Sigma = diag(2))).

Note

This function is an S3 method for specify from the generics package. Because x is a formal argument of specify, if you have a variable in your simulation named x it will be automatically moved to be the first variable (with a message). It is therefore safest to use any other variable name besides x.

Examples

## specify a variable and generate it in the simulation
single_var = specify(a = ~ 1 + rnorm(5)) %>%
  generate(1) # generate a single repetition of the simulation
single_var

two_var = specify(a = ~ 1 + rnorm(5),
                    b = ~ x + 2) %>%
  generate(1)
two_var

## Generates a_01 through a_10
autonumber_var = specify(a = ~ MASS::mvrnorm(5, rep(0, 10), Sigma = diag(10))) %>%
  generate(1)
autonumber_var

# alternatively, you could use a two-sided formula for names
multi_name = specify(cbind(a, b, c) ~ MASS::mvrnorm(5, rep(0, 3), Sigma = diag(3))) %>%
  generate(1)
multi_name

# Simple example of setting a metaparameter
simple_meta = specify(a = ~ 1 + rnorm(n)) %>%
  define(n = c(5, 10)) %>% # without this line you would get an error!
  generate(1)


simple_meta # has two rows now, one for each value of n
simple_meta$sim[[1]] # n = 5
simple_meta$sim[[2]] # n = 10


[Package simpr version 0.2.6 Index]