R: Generate Factor Analysis Models and Data Sets for Simulation...

simFA {fungible}

R Documentation

Generate Factor Analysis Models and Data Sets for Simulation Studies

Description

A function to simulate factor loadings matrices and Monte Carlo data sets for common factor models, bifactor models, and IRT models.

Usage

simFA(
  Model = list(),
  Loadings = list(),
  CrossLoadings = list(),
  Phi = list(),
  ModelError = list(),
  Bifactor = list(),
  MonteCarlo = list(),
  FactorScores = list(),
  Missing = list(),
  Control = list(),
  Seed = NULL
)

Arguments

`Model`	(list) `NFac` (scalar) Number of common or group factors; defaults to `NFac = 3`. `NItemPerFac` (scalar) All factors have the same number of primary loadings. (vector) A vector of length `NFac` specifying the number of primary loadings for each factor; defaults to `NItemPerFac = 3`. `Model` (character) `"orthogonal"` or `"oblique"`; defaults to `Model = "orthogonal"`.
`Loadings`	(list) `FacPattern` (`NULL` or matrix). `FacPattern = M` where `M` is a user-defined factor pattern matrix. `FacPattern = NULL`; `simFA` will generate a factor pattern based on the arguments specified under other keywords (e.g., `Model`, `CrossLoadings`, etc.); defaults to `FacPattern = NULL`. `FacLoadDist` (character) Specifies the sampling distribution for the common factor loadings. Possible values are `"runif"`, `"rnorm"`, `"sequential"`, and `"fixed"`; defaults to `FacLoadDist = "runif"`. `FacLoadRange` (vector of length `NFac`, 2, or 1); defaults to `FacLoadRange = c(.3, .7)`. If `FacLoadDist = "runif"` the vector defines the bounds of the uniform distribution; If `FacLoadDist = "rnorm"` the vector defines the mean and standard deviation of the normal distribution from which loadings are sampled. If `FacLoadDist = "sequential"` the vector specifies the lower and upper bound of the loadings sequence. If `FacLoadDist = "fixed"` and `FacLoadRange` is a vector of length 1 then all common loadings will equal the constant specified in `FacLoadRange`. If `FacLoadDist = "fixed"` and `FacLoadRange` is a vector of length `NFac` then each factor will have fixed loadings as specified by the associated element in `FacLoadRange`. `h2` (vector) An optional vector of communalities used to constrain the population communalities to user-defined values; defaults to `h2 = NULL`.
`CrossLoadings`	(list) `ProbCrossLoad` (scalar) A value in the (0,1) interval that determines the probability that a cross loading will be present in elements of the loadings matrix that do not have salient (primary) factor loadings. If set to `ProbCrossLoad = 1`, a single cross loading will be added to each factor; defaults to `ProbCrossLoad = 0`. `CrossLoadRange` (vector of length 2) Controls size of the cross loadings; defaults to `CrossLoadRange = c(.20, .25)`. `CrossLoadPositions` (matrix) Specifies the row and column positions of (optional) cross loadings; defaults to `CrossLoadPositions = NULL`. `CrossLoadValues` (vector) If `CrossLoadPositions` is specified then `CrossLoadValues` is a vector of user-supplied cross-loadings; defaults to `CrossLoadValues = NULL`. `CrudFactor` (scalar) Controls the size of tertiary factor loadings. If `CrudFactor != 0` then elements of the loadings matrix with neither primary nor secondary (i.e., cross) loadings will be sampled from a \[-(CrudFactor), (CrudFactor)\] uniform distribution; defaults to `CrudFactor = 0`.
`Phi`	(list) `MaxAbsPhi` (scalar) Upper (absolute) bound on factor correlations; defaults to `MaxAbsPhi = .5`. `EigenValPower` (scalar) Controls the skewness of the eigenvalues of Phi. Larger values of `EigenValPower` result in a Phi spectrum that is more right-skewed (and thus closer to a unidimensional model); defaults to `EigenValPower = 2`. `PhiType` (character); defaults to `PhiType = "free"`. If `PhiType = "free"` factor correlations will be randomly generated under the constraints of `MaxAbsPhi` and `EigenValPower`. If `PhiType = "fixed"` all factor correlations will equal the value specified in `MaxAbsPhi`. A fatal error will be produced if `Phi` is not positive semidefinite. If `PhiType = "user"` the factor correlations are defined by the matrix specified in `UserPhi` (see below). `UserPhi` (matrix) A positive semidefinite (PSD) matrix of user-defined factor correlations; defaults to `UserPhi = NULL`.
`ModelError`	(list) `ModelError` (logical) If `ModelError = TRUE` model error will be introduced into the factor pattern via the method described by Tucker, Koopman, and Linn (TKL, 1969); defaults to `ModelError = FALSE`. `W` (matrix) An optional user-supplied factor loading matrix for the `NMinorFac` minor common factors; defaults to `W = NULL`. `NMinorFac` (scalar) Number of minor factors in the TKL model; defaults to `NMinorFac = 150`. `ModelErrorType` (character) If `ModelErrorType = "U"` then `ModelErrorVar` is the proportion of uniqueness variance that is due to model error. If `ModelErrorType = "V"` then `ModelErrorVar` is the proportion of total variance that is due to model error; defaults to `ModelErrorType = "U"`. `ModelErrorVar` (scalar \[0,1\]) The proportion of uniqueness (U) or total (V) variance that is due to model error; defaults to `ModelErrorVar = .10`. `epsTKL` (scalar \[0,1\]) Controls the size of the factor loadings in successive minor factors; defaults to `epsTKL = .20`. `Wattempts` (scalar > 0) Maximum number of tries when attempting to generate a suitable W matrix. Default = 10000. `WmaxLoading` (scalar > 0) Threshold value for `NWmaxLoading`. Default `WmaxLoading = .30`. `NWmaxLoading` (scalar >= 0) Maximum number of absolute loadings >= `WmaxLoading` in any column of W (matrix of model approximation error factor loadings). Default `NWmaxLoading = 2`. Under the defaults, no column of W will have 3 or more loadings > \|.30\|. `PrintW` (Boolean) If `PrintW = TRUE` then simFA will print the attempt history when searching for a suitable W matrix given the constraints defined in `WmaxLoading` and `NWmaxLoading`. Default `PrintW = FALSE`. `RSpecific` (matrix) Optional correlation matrix for specific factors; defaults to `RSpecific = NULL`.
`Bifactor`	(list) Bifactor (logical) If `Bifactor = TRUE` parameters for the bifactor model will be generated; defaults to `Bifactor = FALSE`. Hierarchical (logical) If `Hierarchical = TRUE` then a hierarchical Schmid Leiman (1957) bifactor model will be generated; defaults to `Hierarchical = FALSE`. `F1FactorDist` (character) Specifies the sampling distribution for the general factor loadings. Possible values are `"runif"`, `"rnorm"`, `"sequential"`, and `"fixed"`; defaults to `F1FactorDist = "sequential"`. `F1FactorRange` (vector of length 1 or 2) Controls the sizes of the general factor loadings in non-hierarchical bifactor models; defaults to `F1FactorRange = c(.4, .7)`. If `F1FactorDist = "runif"`, the vector of length 2 defines the bounds of the uniform distribution, c(lower, upper); If `F1FactorDist = "rnorm"`, the vector defines the mean and standard deviation of the normal distribution from which loadings are sampled, c(MN, SD). If `F1FactorDist = "sequential"`, the vector specifies the lower and upper bound of the loadings sequence, c(lower, upper).
`MonteCarlo`	(list) `NSamples` (integer) Defines number of Monte Carlo Samples; defaults to `NSamples = 0`. `SampleSize` (integer) Sample size for each Monte Carlo sample; defaults to `SampleSize = 250`. `Raw` (logical) If `Raw = TRUE`, simulated data sets will contain raw data. If `Raw = FALSE`, simulated data sets will contain correlation matrices; defaults to `Raw = FALSE`. `Thresholds` (list) List elements contain thresholds for each item. Thresholds are required when generating Likert variables.
`FactorScores`	(list) `FS` (logical) If `FS = TRUE` (true) factor scores will be simulated; defaults to `FS = FALSE`. `CFSeed` (integer) Optional starting seed for the common factor scores; defaults to `CFSeed = NULL` in which case a random seed is used. `MCFSeed` (integer) Optional starting seed for the minor common factor scores; defaults to `MCFSeed = NULL`. `SFSeed` (integer) Optional starting seed for the specific factor scores; defaults to `SFSeed = NULL` in which case a random seed is used. `EFSeed` (integer) Optional starting seed for the error factor scores; defaults to `EFSeed = NULL` in which case a random seed is used. Note that `CFSeed`, `MCFSeed`, `SFSeed`, and `EFSeed` must be different numbers (a fatal error is produced when two or more seeds are specified as equal). `VarRel` (vector) A vector of manifest variable reliabilities. The specific factor variance for variable i will equal `VarRel[i] - h^2[i]` (the manifest variable reliability minus its commonality). By default, `VarRel = h^2` (resulting in uniformly zero specific factor variances). `Population` (logical) If `Population = TRUE`, factor scores will fit the correlational constraints of the factor model exactly (e.g., the common factors will be orthogonal to the unique factors); defaults to `Population = FALSE`. `NFacScores` (scalar) Sample size for the factor scores; defaults to `NFacScores = 250`. `Thresholds` (list) A list of quantiles used to polychotomize the observed data that will be generated from the factor scores.
`Missing`	(list) Missing (logical) If `Missing = TRUE` all data sets will contain missing values; defaults to `Missing = FALSE`. `Mechanism` (character) Specifies the missing data mechanism. Currently, the program only supports missing completely at random (MCAR): `Missing = "MCAR"`. `MSProb` (scalar or vector of length `NVar`) Specifies the probability of missingness for each variable; defaults to `MSprob = 0`.
`Control`	(list) `IRT` (logical) If `IRT = TRUE` then user-supplied thresholds will be interpreted as item intercepts; defaults to `IRT = FALSE`. `Dparam` (scalar). If `Dparam = 1` then item intercepts should be scaled in the logistic metric. If `Dparam = 1.702` then intercepts should be scaled in the probit metric. `Maxh2` (scalar) Rows of the loadings matrix will be rescaled to have a maximum communality of `Maxh2`; defaults to `Maxh2 = .98`. `Reflect` (logical) If `Reflect = TRUE` loadings on the common factors will be randomly reflected; defaults to `Reflect = FALSE`.
`Seed`	(integer) Starting seed for the random number generator; defaults to `Seed = NULL`. When no seed is specified by the user, the program will generate a random seed.

Details

For a complete description of simFA's capabilities, users are encouraged to consult the simFABook at http://users.cla.umn.edu/~nwaller/simFA/simFABook.pdf.

simFA is a program for exploring factor analysis models via simulation studies. After calling simFA all relevant output can be saved for further processing by calling one or more of the following object names.

Value

loadings A common factor or bifactor loadings matrix.
Phi A factor correlation matrix.
urloadings The unrotated loadings matrix.
h2 A vector of item communalities.
h2PopME A vector item communalities that may include model approximation error.
Rpop The model-implied population correlation matrix.
RpopME The model-implied population correlation matrix with model error.
W The factor loadings for the minor factors (when ModelError = TRUE). Default = NULL.
Xm That part of the observed scores that is due to the minor common factors.
SFSvars Variances of the Specific Factors in the metric of the observed scores.
ModelErrorFitStats A list of model fit indices (for the underlying equations, see: Bentler, 1990; Hu & Bentler, 1999; Marsh, Hau, & Grayson, 2005; Steiger, 2016):
- SRMR_theta Standardized Root Mean Square Residual based on the model that is implied by the error free major factors only (underlying Rpop),
- SRMR_thetahat Standardized Root Mean Square Residual based on an exploratory factor analysis of the population correlation matrix, RpopME,
- CRMR_theta Correlation Root Mean Square Residual based on the model that is implied by the error free major factors only (underlying Rpop),
- CRMR_thetahat Correlation Root Mean Square Residual based on an exploratory factor analysis of the population correlation matrix, RpopME,
- RMSEA_theta Root Mean Square Error of Approximation (Steiger, 2016) based on the model that is implied by the error free major factors only (underlying Rpop),
- RMSEA_thetahat Root Mean Square Error of Approximation (Steiger, 2016) based on an exploratory factor analysis of the population correlation matrix, RpopME,
- CFI_theta Comparative Fit Index (Bentler, 1990) based on the model that is implied by the error free major factors only (underlying Rpop),
- CFI_thetahat Comparative Fit Index (Bentler, 1990) based on an exploratory factor analysis of the population correlation matrix, RpopME.
- Fm MLE fit function for population target model.
- Fb MLE fit function for population baseline model.
- DFm Degrees of freedom for population target model.
CovMatrices A list containing:
- CovMajor The model implied covariances from the major factors.
- CovMinor The model implied covariances from the minor factors.
- CovUnique The model implied variances from the uniqueness factors.
Bifactor A list containing:
- loadingsHier Factor loadings of the 1st order solution of a hierarchical bifactor model.
- PhiHier Factor correlations of the 1st order solution of a hierarchical bifactor model.
Scores A list containing:
- FactorScores Factor scores for the common and uniqueness factors.
- FacInd Factor indeterminacy indices for the error free population model.
- FacIndME Factor score indeterminacy indices for the population model with model error.
- ObservedScores A matrix of model implied ObservedScores. If Thresholds were supplied under Keyword FactorScores, ObservedScores will be transformed into Likert scores.
Monte A list containing output from the Monte Carlo simulations if generated.
IRT Factor loadings expressed in the normal ogive IRT metric. If Thresholds were given then IRT difficulty values will also be returned.
Seed The initial seed for the random number generator.
call A copy of the function call.
cn A list of all active and nonactive function arguments.

Author(s)

Niels G. Waller with contributions by Hoang V. Nguyen

References

Bentler, P. M. (1990). Comparative fit indexes in structural models. Psychological Bulletin, 107(2), 238–246.

Hu, L.-T. & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal, 6(1), 1–55.

Marsh, H. W., Hau, K.-T., & Grayson, D. (2005). Goodness of fit in structural equation models. In A. Maydeu-Olivares & J. J. McArdle (Eds.), Multivariate applications book series. Contemporary psychometrics: A festschrift for Roderick P. McDonald (p. 275–340). Lawrence Erlbaum Associates Publishers.

Schmid, J. and Leiman, J. M. (1957). The development of hierarchical factor solutions. Psychometrika, 22(1), 53–61.

Steiger, J. H. (2016). Notes on the Steiger–Lind (1980) handout. Structural Equation Modeling: A Multidisciplinary Journal, 23:6, 777-781.

Tucker, L. R., Koopman, R. F., and Linn, R. L. (1969). Evaluation of factor analytic research procedures by means of simulated correlation matrices. Psychometrika, 34(4), 421–459.

Examples


## Not run:
#  Ex 1. Three Factor Simple Structure Model with Cross loadings and
#  Ideal Non salient Loadings
   out <-  simFA(Seed = 1)
   print( round( out$loadings, 2 ) )

# Ex 2. Non Hierarchical bifactor model 3 group factors
# with constant loadings on the general factor
   out <- simFA(Bifactor = list(Bifactor = TRUE,
                                Hierarchical = FALSE,
                                F1FactorRange = c(.4, .4),
                                F1FactorDist = "runif"),
                Seed = 1)
   print( round( out$loadings, 2 ) )

   # Ex 3.  Model Fit Statistics for Population Data with
   # Model Approximation Error. Three Factor model.
       out <- simFA(Loadings = list(FacLoadDist = "fixed",
                                    FacLoadRange = .5),
                    ModelError = list(ModelError = TRUE,
                                      NMinorFac = 150,
                                      ModelErrorType = "V",
                                      ModelErrorVar = .1,
                                      Wattempts = 10000,
                                      epsTKL = .2),
                    Seed = 1)

       print( out$loadings )
       print( out$ModelErrorFitStats[seq(2,8,2)] )

## End(**Not run**)

[Package fungible version 2.4.4 Index]