simulate_dataset {WoodSimulatR}R Documentation

Generate an artificial dataset with correlated variables

Description

Generate an artificial dataset with correlated variables and defined means and standard deviations.

Usage

simulate_dataset(
  n = 5000,
  subsets = 4,
  random_seed = NULL,
  simbase = WoodSimulatR::ws_t_logf,
  loadtype = NULL,
  ...,
  RNGversion = "3.6.0"
)

Arguments

n

Number of rows in the dataset

subsets

Either NULL, or a data.frame describing the subsets (see details) or a character vector or named numeric vector suitable for argument country in get_subsample_definitions.

random_seed

Allows to set an integer seed value for the random number generator to achieve reproducible results (see also set.seed).

simbase

An object of class simbase_covar or simbase_list. In particular, one of the simbases stored in WoodSimulatR may be used – see simbase.

loadtype

For passing on to get_subsample_definitions. A string with either "t" (for material tested in tension) or "be" (for material tested in edgewise bending). Is only used if the simbase doesn't contain a field loadtype or if the loadtype is ambiguous or not equal to "t" or "be".

...

arguments passed on to get_subsample_definitions.

RNGversion

In WoodSimulatR 0.5, the RNGversion had been fixed to RNGversion = "3.5.0", but this setting now causes a warning because the random number generator was changed in R version 3.6.0 (see RNGversion). For perfect reproducibility of results from WoodSimulatR 0.5, one should set RNGversion = "3.5.0".

Details

In the package WoodSimulatR, a number of predefined base values for simulation are stored – see simbase.

Using a character vector for the argument subsets leads to subsets as equal in size as possible.

The argument subsets enables differing means and standard deviations for different subsamples. There are several possible usages:

The argument simbase can be either an object of class simbase_covar or of class simbase_list.

Both the means and standard deviations in the subsample definitions (see get_subsample_definitions) as well as the values in the simbase depend on the way the destructive testing of the sawn timber was done. If the simbase has a field loadtype (see also simbase_covar), this value is used in the call to get_subsample_definitions. Otherwise, the loadtype has to be passed directly to the present function unless no call to get_subsample_definitions is necessary (this depends on the value of subsets – see above). If a loadtype has been defined, a variable loadtype is also created in the resulting dataset for reference.

Negative values in any numeric column of the result dataset are forced to zero.

If random_seed is not NULL, reproducibility of results is enforced by using set.seed with arguments kind='Mersenne-Twister' and normal.kind='Inversion', and by calling RNGversion with argument RNGversion.

If random_seed is not NULL, the random number generator is reset at the end of the function using set.seed(NULL) and RNGversion(toString(getRversion())).

Examples

simulate_dataset(n = 10, subsets = 1, random_seed = 1)

# As the loadtype is defined in the simbase, the argument loadtype is ignored
# with a warning
simulate_dataset(n = 10, subsets = 1, random_seed = 1, loadtype = 'be')

# Two subsamples
simulate_dataset(n = 10, subsets = 2, random_seed = 1)

# Two subsamples from pre-defined countries
simulate_dataset(n = 10, subsets = c('at', 'de'), random_seed = 1)

# Two subsamples from pre-defined countries with different sample sizes
simulate_dataset(n = 10, subsets = c(at = 3, de = 2), random_seed = 1)

[Package WoodSimulatR version 0.6.1 Index]