simCategorical {simPop} | R Documentation |
Simulate categorical variables of population data
Description
Simulate categorical variables of population data. The household structure of the population data needs to be simulated beforehand.
Usage
simCategorical(
simPopObj,
additional,
method = c("multinom", "distribution", "ctree", "cforest", "ranger", "xgboost"),
limit = NULL,
censor = NULL,
maxit = 500,
MaxNWts = 1500,
eps = NULL,
nr_cpus = NULL,
regModel = NULL,
seed = 1,
verbose = FALSE,
by = "strata",
model_params = NULL
)
Arguments
simPopObj |
a |
additional |
a character vector specifying additional categorical
variables available in the sample object of |
method |
a character string specifying the method to be used for
simulating the additional categorical variables. Accepted values are
|
limit |
if |
censor |
if |
maxit , MaxNWts |
control parameters to be passed to
|
eps |
a small positive numeric value, or |
nr_cpus |
if specified, an integer number defining the number of cpus that should be used for parallel processing. |
regModel |
allows to specify the variables or model that is used when simulating additional categorical variables. The following choices are available if different from NULL.
If method 'distribution' is used, it is only possible to specify a vector of length one containing one of the choices described above. If parameter 'regModel' is NULL, only basic household variables are used in any case. |
seed |
optional; an integer value to be used as the seed of the random number generator, or an integer vector containing the state of the random number generator to be restored. |
verbose |
set to TRUE if additional print output should be shown. |
by |
defining which variable to use as split up variable of the estimation. Defaults to the strata variable. |
model_params |
NULL or a named list which can contain model specific parameters which will be passed onto the function call for the respective model. |
Details
The number of cpus are selected automatically in the following manner. The number of cpus is equal the number of strata. However, if the number of cpus is less than the number of strata, the number of cpus - 1 is used by default. This should be the best strategy, but the user can also overwrite this decision.
Value
An object of class simPopObj
containing survey
data as well as the simulated population data including the categorical
variables specified by argument additional
.
Note
The basic household structure needs to be simulated beforehand with
the function simStructure
.
Author(s)
Bernhard Meindl, Andreas Alfons, Stefan Kraft, Alexander Kowarik, Matthias Templ, Siro Fritzmann
References
B. Meindl, M. Templ, A. Kowarik, O. Dupriez (2017) Simulation of Synthetic Populations for Survey Data Considering Auxiliary Information. Journal of Statistical Survey, 79 (10), 1–38. doi:10.18637/jss.v079.i10
A. Alfons, M. Templ (2011) Simulation of close-to-reality population data for household surveys with application to EU-SILC. Statistical Methods & Applications, 20 (3), 383–407. doi:10.1080/02664763.2013.859237
See Also
simStructure
, simRelation
,
simContinuous
, simComponents
Examples
data(eusilcS) # load sample data
## Not run:
## approx. 20 seconds computation time
inp <- specifyInput(data=eusilcS, hhid="db030", hhsize="hsize", strata="db040", weight="db090")
## in the following, nr_cpus are selected automatically
simPop <- simStructure(data=inp, method="direct", basicHHvars=c("age", "rb090"))
simPop <- simCategorical(simPop, additional=c("pl030", "pb220a"), method="multinom", nr_cpus=1)
simPop
## End(Not run)