generate {simsem} | R Documentation |
Generate data using SimSem template
Description
This function can be used to generate random data based on the 1. SimSem
objects created with the model
function, 2. lavaan
script or parameter tables, or 3. an MxModel
object from the OpenMx
package. Some notable features include fine control of misspecification and misspecification optimization (for SimSem
only), as well as the ability to generate non-normal data. When using simsem for simulations, this function is used internally to generate data in the function sim
, and can be helpful for debugging, or in creating data for use with other analysis programs.
Usage
generate(model, n, maxDraw=50, misfitBounds=NULL, misfitType="f0",
averageNumMisspec=FALSE, optMisfit=NULL, optDraws=50,
createOrder = c(1, 2, 3), indDist=NULL, sequential=FALSE,
facDist=NULL, errorDist=NULL, saveLatentVar = FALSE, indLab=NULL,
modelBoot=FALSE, realData=NULL, covData=NULL, params=FALSE, group = NULL,
empirical = FALSE, ...)
Arguments
model |
A |
n |
Integer of sample size. |
maxDraw |
Integer specifying the maximum number of attempts to draw a valid set of parameters (no negative error variance, standardized coefficients over 1). |
misfitBounds |
Vector that contains upper and lower bounds of the misfit measure. Sets of parameters drawn that are not within these bounds are rejected. |
misfitType |
Character vector indicating the fit measure used to assess the misfit of a set of parameters. Can be "f0", "rmsea", "srmr", or "all". |
averageNumMisspec |
If |
optMisfit |
Character vector of either "min" or "max" indicating either maximum or minimum optimized misfit. If not null, the set of parameters out of the number of draws in "optDraws" that has either the maximum or minimum misfit of the given misfit type will be returned. |
optDraws |
Number of parameter sets to draw if optMisfit is not null. The set of parameters with the maximum or minimum misfit will be returned. |
createOrder |
The order of 1) applying equality/inequality constraints, 2) applying misspecification, and 3) fill unspecified parameters (e.g., residual variances when total variances are specified). The specification of this argument is a vector of different orders of 1 (constraint), 2 (misspecification), and 3 (filling parameters). For example, |
indDist |
A |
sequential |
If |
facDist |
A |
errorDist |
An object or list of objects of type |
saveLatentVar |
If |
indLab |
A vector of indicator labels. When not specified, the variable names are |
modelBoot |
When specified, a model-based bootstrap is used for data generation. See details for further information. This argument requires real data to be passed to |
realData |
A data.frame containing real data. The data generated will follow the distribution of this data set. |
covData |
A data.frame containing covariate data, which can have any distributions. This argument is required when users specify |
params |
If |
group |
The label of the grouping variable |
empirical |
Logical. If |
... |
Additional arguments for the |
Details
If the lavaan
script or the MxModel
are provided, the model-implied covariance matrix will be computed and internally use createData
function to generate data. The data-generation method is based on whether the indDist
argument is specified. For the lavaan
script, the code for data generation is modified from the simulateData
function.
If the SimSem
object is specified, it will check whether there are any random parameters or trivial misspecification in the model. If so, real or misspecified parameters are drawn via the draw
function. Next, there are two methods to generate data. First, the function will calculate the model-implied covariance matrix (including model misspecification) and generate data similar to the lavaan
script or the MxModel
object. The second method is referred to as the sequential
method, which can be used by specifying the sequential
argument as TRUE
. This function will create data based on the chain of equations in structural equation modeling such that independent variables and errors are generated and added as dependent variables and the dependent variables will be treated as independent variables in the next equation. For example, in the model with factor A and B are independent variables, factor C are dependent variables, factors A and B are generated first. Then, residual in factor C are created and added with factors A and B. This current step has all factor scores. Then, measurement errors are created and added with factor scores to create indicator scores. During each step, independent variables and errors can be nonnormal by setting facDist
or errorDist
arguments. The data generation in each step is based on the createData
function.
For the model-based bootstrap (providing the realData
argument), the transformation proposed by Yung & Bentler (1996) is used. This procedure is the expansion from the Bollen and Stine (1992) bootstrap including a mean structure. The model-implied mean vector and covariance matrix with trivial misspecification will be used in the model-based bootstrap if misspec
is specified. See page 133 of Bollen and Stine (1992) for a reference.
Value
A data.frame containing simulated data from the data generation template. A variable "group" is appended indicating group membership.
Author(s)
Sunthud Pornprasertmanit (psunthud@gmail.com), Patrick Miller (University of Notre Dame; pmille13@nd.edu), the data generation code for lavaan script is modifed from the simulateData
function in lavaan
written by Yves Rosseel
References
Bollen, K. A., & Stine, R. A. (1992). Bootstrapping goodness-of-fit measures in structural equation models. Sociological Methods and Research, 21, 205-229.
Yung, Y.-F., & Bentler, P. M. (1996). Bootstrapping techniques in analysis of mean and covariance structures. In G. A. Marcoulides & R. E. Schumacker (Eds.), Advanced structural equation modeling: Issues and techniques (pp. 195-226). Mahwah, NJ: Erlbaum.
See Also
-
createData
To generate random data using a set of parameters fromdraw
Examples
loading <- matrix(0, 6, 2)
loading[1:3, 1] <- NA
loading[4:6, 2] <- NA
LY <- bind(loading, 0.7)
latent.cor <- matrix(NA, 2, 2)
diag(latent.cor) <- 1
RPS <- binds(latent.cor, 0.5)
RTE <- binds(diag(6))
VY <- bind(rep(NA,6),2)
CFA.Model <- model(LY = LY, RPS = RPS, RTE = RTE, modelType = "CFA")
dat <- generate(CFA.Model, 200)
# Get the latent variable scores
dat2 <- generate(CFA.Model, 20, sequential = TRUE, saveLatentVar = TRUE)
dat2
attr(dat2, "latentVar")