specify.formula {simpr} | R Documentation |
Specify data-generating mechanisms
Description
Specify the data-generating mechanisms for the simulation using purrr-style lambda functions.
Usage
## S3 method for class 'formula'
specify(x = NULL, ..., .use_names = TRUE, .sep = "_")
Arguments
x |
leave this argument blank (NULL); this argument is a placeholder and can be skipped. |
... |
named |
.use_names |
Whether to use names generated by the lambda function (TRUE, the default), or to overwrite them with supplied names. |
.sep |
Specify the separator for auto-generating names. See Column naming. |
Details
This is always the first command in the
simulation process, to specify the actual
simulated variables, which is then passed to
define
to define metaparameters
and then to
generate
to
generate the data.
The ...
arguments use an efficient
syntax to specify custom functions needed for
generating a simulation, based on the
purrr
package. When producing one
variable, one can provide an expression such as
specify(a = ~ 3 + runif(10))
; the
expression is preceded by ~
, the tilde
operator, and can refer to previous arguments
in specify
or to metaparameters in
define
. This is called a lambda
function.
Order matters: arguments are evaluated
sequentially, so later argument can refer to an
earlier one, e.g. specify(a = ~ rnorm(2),
b = ~ a + rnorm(2))
.
generate
combines results together into a single tibble
for each simulation, so all lambda functions
should produce the same number of rows.
However, a lambda function can produce multiple
columns.
Value
A simpr_specify
object which
contains the functions needed to generate the
simulation; to be passed to
define
for defining
metaparameters or, if there are no
metaparameters, directly to
generate
for generating the simulation.
Also useful is the fact that one can refer to
variables in subsequent arguments. So, one
could define another variable b
that
depends on a
very simply, e.g.
specify(a = ~ 3 + runif(10), b = ~ 2 *
x)
.
Finally, one can also refer to metaparameters
that are to be systematically varied in the
simulation study. See define
and the examples for more details.
Column naming
Because functions can produce different
numbers of columns, there are several options
for naming columns. If a provided lambda
function produces a single column, the name
given to the argument becomes the name of the
column. If the lambda function already
produces column names, then the output will
use these names if .use_names = TRUE
,
the default. Otherwise, simpr uses the
argument name as a base and auto-numbers the
columns. For instance, if the argument
a
generates a two-column matrix and
.sep = "_"
(the default) the columns
will be named a_1
and a_2
.
Custom names can also be directly provided by
a double-sided formula. The left-hand side
must use c
or
cbind
, e.g. specify(c(a,
b) ~ MASS::mvrnorm(5, c(0, 0), Sigma =
diag(2)))
.
Note
This function is an S3 method for
specify
from the
generics
package. Because x
is
a formal argument of
specify
, if you have
a variable in your simulation named x
it will be automatically moved to be the
first variable (with a message). It is therefore
safest to use any other variable name besides
x
.
Examples
## specify a variable and generate it in the simulation
single_var = specify(a = ~ 1 + rnorm(5)) %>%
generate(1) # generate a single repetition of the simulation
single_var
two_var = specify(a = ~ 1 + rnorm(5),
b = ~ x + 2) %>%
generate(1)
two_var
## Generates a_01 through a_10
autonumber_var = specify(a = ~ MASS::mvrnorm(5, rep(0, 10), Sigma = diag(10))) %>%
generate(1)
autonumber_var
# alternatively, you could use a two-sided formula for names
multi_name = specify(cbind(a, b, c) ~ MASS::mvrnorm(5, rep(0, 3), Sigma = diag(3))) %>%
generate(1)
multi_name
# Simple example of setting a metaparameter
simple_meta = specify(a = ~ 1 + rnorm(n)) %>%
define(n = c(5, 10)) %>% # without this line you would get an error!
generate(1)
simple_meta # has two rows now, one for each value of n
simple_meta$sim[[1]] # n = 5
simple_meta$sim[[2]] # n = 10