simFA {fungible}R Documentation

Generate Factor Analysis Models and Data Sets for Simulation Studies

Description

A function to simulate factor loadings matrices and Monte Carlo data sets for common factor models, bifactor models, and IRT models.

Usage

simFA(
  Model = list(),
  Loadings = list(),
  CrossLoadings = list(),
  Phi = list(),
  ModelError = list(),
  Bifactor = list(),
  MonteCarlo = list(),
  FactorScores = list(),
  Missing = list(),
  Control = list(),
  Seed = NULL
)

Arguments

Model

(list)

  • NFac (scalar) Number of common or group factors; defaults to NFac = 3.

  • NItemPerFac

    • (scalar) All factors have the same number of primary loadings.

    • (vector) A vector of length NFac specifying the number of primary loadings for each factor; defaults to NItemPerFac = 3.

  • Model (character) "orthogonal" or "oblique"; defaults to Model = "orthogonal".

Loadings

(list)

  • FacPattern (NULL or matrix).

    • FacPattern = M where M is a user-defined factor pattern matrix.

    • FacPattern = NULL; simFA will generate a factor pattern based on the arguments specified under other keywords (e.g., Model, CrossLoadings, etc.); defaults to FacPattern = NULL.

  • FacLoadDist (character) Specifies the sampling distribution for the common factor loadings. Possible values are "runif", "rnorm", "sequential", and "fixed"; defaults to FacLoadDist = "runif".

  • FacLoadRange (vector of length NFac, 2, or 1); defaults to FacLoadRange = c(.3, .7).

    • If FacLoadDist = "runif" the vector defines the bounds of the uniform distribution;

    • If FacLoadDist = "rnorm" the vector defines the mean and standard deviation of the normal distribution from which loadings are sampled.

    • If FacLoadDist = "sequential" the vector specifies the lower and upper bound of the loadings sequence.

    • If FacLoadDist = "fixed" and FacLoadRange is a vector of length 1 then all common loadings will equal the constant specified in FacLoadRange. If FacLoadDist = "fixed" and FacLoadRange is a vector of length NFac then each factor will have fixed loadings as specified by the associated element in FacLoadRange.

  • h2 (vector) An optional vector of communalities used to constrain the population communalities to user-defined values; defaults to h2 = NULL.

CrossLoadings

(list)

  • ProbCrossLoad (scalar) A value in the (0,1) interval that determines the probability that a cross loading will be present in elements of the loadings matrix that do not have salient (primary) factor loadings. If set to ProbCrossLoad = 1, a single cross loading will be added to each factor; defaults to ProbCrossLoad = 0.

  • CrossLoadRange (vector of length 2) Controls size of the cross loadings; defaults to CrossLoadRange = c(.20, .25).

  • CrossLoadPositions (matrix) Specifies the row and column positions of (optional) cross loadings; defaults to CrossLoadPositions = NULL.

  • CrossLoadValues (vector) If CrossLoadPositions is specified then CrossLoadValues is a vector of user-supplied cross-loadings; defaults to CrossLoadValues = NULL.

  • CrudFactor (scalar) Controls the size of tertiary factor loadings. If CrudFactor != 0 then elements of the loadings matrix with neither primary nor secondary (i.e., cross) loadings will be sampled from a \[-(CrudFactor), (CrudFactor)\] uniform distribution; defaults to CrudFactor = 0.

Phi

(list)

  • MaxAbsPhi (scalar) Upper (absolute) bound on factor correlations; defaults to MaxAbsPhi = .5.

  • EigenValPower (scalar) Controls the skewness of the eigenvalues of Phi. Larger values of EigenValPower result in a Phi spectrum that is more right-skewed (and thus closer to a unidimensional model); defaults to EigenValPower = 2.

  • PhiType (character); defaults to PhiType = "free".

    • If PhiType = "free" factor correlations will be randomly generated under the constraints of MaxAbsPhi and EigenValPower.

    • If PhiType = "fixed" all factor correlations will equal the value specified in MaxAbsPhi. A fatal error will be produced if Phi is not positive semidefinite.

    • If PhiType = "user" the factor correlations are defined by the matrix specified in UserPhi (see below).

  • UserPhi (matrix) A positive semidefinite (PSD) matrix of user-defined factor correlations; defaults to UserPhi = NULL.

ModelError

(list)

  • ModelError (logical) If ModelError = TRUE model error will be introduced into the factor pattern via the method described by Tucker, Koopman, and Linn (TKL, 1969); defaults to ModelError = FALSE.

  • W (matrix) An optional user-supplied factor loading matrix for the NMinorFac minor common factors; defaults to W = NULL.

  • NMinorFac (scalar) Number of minor factors in the TKL model; defaults to NMinorFac = 150.

  • ModelErrorType (character) If ModelErrorType = "U" then ModelErrorVar is the proportion of uniqueness variance that is due to model error. If ModelErrorType = "V" then ModelErrorVar is the proportion of total variance that is due to model error; defaults to ModelErrorType = "U".

  • ModelErrorVar (scalar \[0,1\]) The proportion of uniqueness (U) or total (V) variance that is due to model error; defaults to ModelErrorVar = .10.

  • epsTKL (scalar \[0,1\]) Controls the size of the factor loadings in successive minor factors; defaults to epsTKL = .20.

  • Wattempts (scalar > 0) Maximum number of tries when attempting to generate a suitable W matrix. Default = 10000.

  • WmaxLoading (scalar > 0) Threshold value for NWmaxLoading. Default WmaxLoading = .30.

  • NWmaxLoading (scalar >= 0) Maximum number of absolute loadings >= WmaxLoading in any column of W (matrix of model approximation error factor loadings). Default NWmaxLoading = 2. Under the defaults, no column of W will have 3 or more loadings > |.30|.

  • PrintW (Boolean) If PrintW = TRUE then simFA will print the attempt history when searching for a suitable W matrix given the constraints defined in WmaxLoading and NWmaxLoading. Default PrintW = FALSE.

  • RSpecific (matrix) Optional correlation matrix for specific factors; defaults to RSpecific = NULL.

Bifactor

(list)

  • Bifactor (logical) If Bifactor = TRUE parameters for the bifactor model will be generated; defaults to Bifactor = FALSE.

  • Hierarchical (logical) If Hierarchical = TRUE then a hierarchical Schmid Leiman (1957) bifactor model will be generated; defaults to Hierarchical = FALSE.

  • F1FactorDist (character) Specifies the sampling distribution for the general factor loadings. Possible values are "runif", "rnorm", "sequential", and "fixed"; defaults to F1FactorDist = "sequential".

  • F1FactorRange (vector of length 1 or 2) Controls the sizes of the general factor loadings in non-hierarchical bifactor models; defaults to F1FactorRange = c(.4, .7).

    • If F1FactorDist = "runif", the vector of length 2 defines the bounds of the uniform distribution, c(lower, upper);

    • If F1FactorDist = "rnorm", the vector defines the mean and standard deviation of the normal distribution from which loadings are sampled, c(MN, SD).

    • If F1FactorDist = "sequential", the vector specifies the lower and upper bound of the loadings sequence, c(lower, upper).

MonteCarlo

(list)

  • NSamples (integer) Defines number of Monte Carlo Samples; defaults to NSamples = 0.

  • SampleSize (integer) Sample size for each Monte Carlo sample; defaults to SampleSize = 250.

  • Raw (logical) If Raw = TRUE, simulated data sets will contain raw data. If Raw = FALSE, simulated data sets will contain correlation matrices; defaults to Raw = FALSE.

  • Thresholds (list) List elements contain thresholds for each item. Thresholds are required when generating Likert variables.

FactorScores

(list)

  • FS (logical) If FS = TRUE (true) factor scores will be simulated; defaults to FS = FALSE.

  • CFSeed (integer) Optional starting seed for the common factor scores; defaults to CFSeed = NULL in which case a random seed is used.

  • MCFSeed (integer) Optional starting seed for the minor common factor scores; defaults to MCFSeed = NULL.

  • SFSeed (integer) Optional starting seed for the specific factor scores; defaults to SFSeed = NULL in which case a random seed is used.

  • EFSeed (integer) Optional starting seed for the error factor scores; defaults to EFSeed = NULL in which case a random seed is used. Note that CFSeed, MCFSeed, SFSeed, and EFSeed must be different numbers (a fatal error is produced when two or more seeds are specified as equal).

  • VarRel (vector) A vector of manifest variable reliabilities. The specific factor variance for variable i will equal VarRel[i] - h^2[i] (the manifest variable reliability minus its commonality). By default, VarRel = h^2 (resulting in uniformly zero specific factor variances).

  • Population (logical) If Population = TRUE, factor scores will fit the correlational constraints of the factor model exactly (e.g., the common factors will be orthogonal to the unique factors); defaults to Population = FALSE.

  • NFacScores (scalar) Sample size for the factor scores; defaults to NFacScores = 250.

  • Thresholds (list) A list of quantiles used to polychotomize the observed data that will be generated from the factor scores.

Missing

(list)

  • Missing (logical) If Missing = TRUE all data sets will contain missing values; defaults to Missing = FALSE.

  • Mechanism (character) Specifies the missing data mechanism. Currently, the program only supports missing completely at random (MCAR): Missing = "MCAR".

  • MSProb (scalar or vector of length NVar) Specifies the probability of missingness for each variable; defaults to MSprob = 0.

Control

(list)

  • IRT (logical) If IRT = TRUE then user-supplied thresholds will be interpreted as item intercepts; defaults to IRT = FALSE.

  • Dparam (scalar). If Dparam = 1 then item intercepts should be scaled in the logistic metric. If Dparam = 1.702 then intercepts should be scaled in the probit metric.

  • Maxh2 (scalar) Rows of the loadings matrix will be rescaled to have a maximum communality of Maxh2; defaults to Maxh2 = .98.

  • Reflect (logical) If Reflect = TRUE loadings on the common factors will be randomly reflected; defaults to Reflect = FALSE.

Seed

(integer) Starting seed for the random number generator; defaults to Seed = NULL. When no seed is specified by the user, the program will generate a random seed.

Details

For a complete description of simFA's capabilities, users are encouraged to consult the simFABook at http://users.cla.umn.edu/~nwaller/simFA/simFABook.pdf.

simFA is a program for exploring factor analysis models via simulation studies. After calling simFA all relevant output can be saved for further processing by calling one or more of the following object names.

Value

Author(s)

Niels G. Waller with contributions by Hoang V. Nguyen

References

Bentler, P. M. (1990). Comparative fit indexes in structural models. Psychological Bulletin, 107(2), 238–246.

Hu, L.-T. & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal, 6(1), 1–55.

Marsh, H. W., Hau, K.-T., & Grayson, D. (2005). Goodness of fit in structural equation models. In A. Maydeu-Olivares & J. J. McArdle (Eds.), Multivariate applications book series. Contemporary psychometrics: A festschrift for Roderick P. McDonald (p. 275–340). Lawrence Erlbaum Associates Publishers.

Schmid, J. and Leiman, J. M. (1957). The development of hierarchical factor solutions. Psychometrika, 22(1), 53–61.

Steiger, J. H. (2016). Notes on the Steiger–Lind (1980) handout. Structural Equation Modeling: A Multidisciplinary Journal, 23:6, 777-781.

Tucker, L. R., Koopman, R. F., and Linn, R. L. (1969). Evaluation of factor analytic research procedures by means of simulated correlation matrices. Psychometrika, 34(4), 421–459.

Examples


## Not run:
#  Ex 1. Three Factor Simple Structure Model with Cross loadings and
#  Ideal Non salient Loadings
   out <-  simFA(Seed = 1)
   print( round( out$loadings, 2 ) )

# Ex 2. Non Hierarchical bifactor model 3 group factors
# with constant loadings on the general factor
   out <- simFA(Bifactor = list(Bifactor = TRUE,
                                Hierarchical = FALSE,
                                F1FactorRange = c(.4, .4),
                                F1FactorDist = "runif"),
                Seed = 1)
   print( round( out$loadings, 2 ) )

   # Ex 3.  Model Fit Statistics for Population Data with
   # Model Approximation Error. Three Factor model.
       out <- simFA(Loadings = list(FacLoadDist = "fixed",
                                    FacLoadRange = .5),
                    ModelError = list(ModelError = TRUE,
                                      NMinorFac = 150,
                                      ModelErrorType = "V",
                                      ModelErrorVar = .1,
                                      Wattempts = 10000,
                                      epsTKL = .2),
                    Seed = 1)

       print( out$loadings )
       print( out$ModelErrorFitStats[seq(2,8,2)] )

## End(**Not run**)


[Package fungible version 2.4.4 Index]