simulate_data {simdata} | R Documentation |
Simulate design matrix
Description
Generate simulated dataset based on transformation of an underlying base distribution.
Usage
simulate_data(generator, ...)
## Default S3 method:
simulate_data(
generator = function(n) matrix(rnorm(n)),
n_obs = 1,
transform_initial = base::identity,
names_final = NULL,
prefix_final = NULL,
process_final = list(),
seed = NULL,
...
)
## S3 method for class 'simdesign'
simulate_data(
generator,
n_obs = 1,
seed = NULL,
apply_transformation = TRUE,
apply_processing = TRUE,
...
)
Arguments
generator |
Function which generates data from the underlying base distribution. It is
assumed it takes the number of simulated observations |
... |
Further arguments passed to |
n_obs |
Number of simulated observations. |
transform_initial |
Function which specifies the transformation of the underlying
dataset |
names_final |
NULL or character vector with variable names for final dataset |
prefix_final |
NULL or prefix attached to variables in final dataset |
process_final |
List of lists specifying post-processing functions applied to final
datamatrix |
seed |
Set random seed to ensure reproducibility of results. |
apply_transformation |
This argument can be set to FALSE to override the information stored in the
passed |
apply_processing |
This argument can be set to FALSE to override the information stored in the
passed |
Details
Data is generated using the following procedure:
An underlying dataset
Z
is sampled from some distribution. This is done by a call to thegenerator
function.-
Z
is then transformed into the final datasetX
by applying thetransform
function toZ
. -
X
is post-processed if specified (e.g. truncation to avoid outliers).
Value
Data.frame or matrix with n_obs
rows for simulated dataset X
.
Methods (by class)
-
simulate_data(default)
: Function to be used if nosimdesign
S3 class is used. -
simulate_data(simdesign)
: Function to be used withsimdesign
S3 class.
Generators
The generator
function which is either passed directly, or via a
simdata::simdesign
object, is assumed to provide the same interface
as the random generation functions in the R stats and extraDistr
packages. Specifically, that means it takes the number of observations as
first argument. All further arguments can be set via passing them as
named argument to this function. It is expected to return a two-dimensional
array (matrix or data.frame) for which the number of columns can be
determined. Otherwise the check_and_infer
step will fail.
Transformations
Transformations should be applicable to the output of the generator
function (i.e. take a data.frame or matrix as input) and output another
data.frame or matrix. A convenience function function_list
is
provided by this package to specify transformations as a list of functions,
which take the whole datamatrix Z
as single argument and can be used to
apply specific transformations to the columns of that matrix. See the
documentation for function_list
for details.
Post-processing
Post-processing the datamatrix is based on do_processing
.
Naming of variables
Variables are named by names_final
if not NULL and of correct length.
Otherwise, if prefix_final
is not NULL, it is used as prefix for variable
numbers. Otherwise, variables names remain as returned by the generator
function.
Note
This function is best used in conjunction with the simdesign
S3 class or any template based upon it, which facilitates further data
visualization and conveniently stores information as a template for
simulation tasks.
See Also
simdesign
,
simdesign_mvtnorm
,
simulate_data_conditional
,
do_processing
Examples
generator <- function(n) mvtnorm::rmvnorm(n, mean = 0)
simulate_data(generator, 10, seed = 24)