restricted_simulation {DataFakeR}R Documentation

Simulate data restricted by extra column parameters

Description

The functions allow to define a set of methods for simulating data using additional column-based parameters such as range or values.

Usage

opt_simul_restricted_character(
  f_key = simul_restricted_character_fkey,
  ...,
  in_set = simul_restricted_character_in_set
)

opt_simul_restricted_numeric(
  f_key = simul_restricted_numeric_fkey,
  ...,
  in_set = simul_restricted_numeric_in_set,
  range = simul_restricted_numeric_range
)

opt_simul_restricted_integer(
  f_key = simul_restricted_integer_fkey,
  ...,
  in_set = simul_restricted_integer_in_set,
  range = simul_restricted_integer_range
)

opt_simul_restricted_logical(f_key = simul_restricted_integer_fkey, ...)

opt_simul_restricted_date(
  f_key = simul_restricted_integer_fkey,
  ...,
  range = simul_restricted_date_range
)

Arguments

f_key

Method for simulating foreign key columns. The values parameter of the function, receives all the unique values from parent primary key column.

...

Other methods that can be defined to handle extra parameters.

in_set

Method for simulating columns from defined set of values. The values parameter of the function, take all the values defined in YAML column definition as values parameter.

range

Method for simulating columns fitting inside defined range. It takes special parameter range 2-length vector minimum and maximum value for simulated data.

Details

Except for the standard column parameters, that are now:

it is also allowed to add custom ones (either directly in YAML configuration file, or in opt_default_<column_type> functions).

In order to respect simulation using such parameters, we may want to define our custom simulation functions.

Such functions should be defined as a parameters of opt_simul_restricted_<column_type> functions, and each of them should take special parameter as its own one.

When the parameter condition is not met (for example the parameter is missing) such function should return NULL value. This allows the simulation workflow to move to the next defined method. The order of methods execution is followed by the order of defined parameters in the below methods.

That means, the highest priority always have f_key - a special method that is used for foreign key columns, and simulates only from values received from parent primary key.

The second priority method for character type columns is in_set, that seeks for values column parameter, and when such exists it simulates the data from defined set of values. See simul_restricted_character_in_set definition to check details.


[Package DataFakeR version 0.1.3 Index]