R: Syntax for measurement equivalence

measEq.syntax {semTools}

R Documentation

Syntax for measurement equivalence

Description

Automatically generates lavaan model syntax to specify a confirmatory factor analysis (CFA) model with equality constraints imposed on user-specified measurement (or structural) parameters. Optionally returns the fitted model (if data are provided) representing some chosen level of measurement equivalence/invariance across groups and/or repeated measures.

Usage

measEq.syntax(configural.model, ..., ID.fac = "std.lv",
  ID.cat = "Wu.Estabrook.2016", ID.thr = c(1L, 2L), group = NULL,
  group.equal = "", group.partial = "", longFacNames = list(),
  longIndNames = list(), long.equal = "", long.partial = "",
  auto = "all", warn = TRUE, debug = FALSE, return.fit = FALSE)

Arguments

`configural.model`	A model with no measurement-invariance constraints (i.e., representing only configural invariance), unless required for model identification. `configural.model` can be either: lavaan `model.syntax` or a parameter table (as returned by `parTable`) specifying the configural model. Using this option, the user can also provide either raw `data` or summary statistics via `sample.cov` and (optionally) `sample.mean`. See argument descriptions in `lavaan`. In order to include thresholds in the generated syntax, either users must provide raw `data`, or the `configural.model` syntax must specify all thresholds (see first example). If raw `data` are not provided, the number of blocks (groups, levels, or combination) must be indicated using an arbitrary `sample.nobs` argument (e.g., 3 groups could be specified using `sample.nobs=rep(1, 3)`). a fitted `lavaan` model (e.g., as returned by `cfa`) estimating the configural model Note that the specified or fitted model must not contain any latent structural parameters (i.e., it must be a CFA model), unless they are higher-order constructs with latent indicators (i.e., a second-order CFA).
`...`	Additional arguments (e.g., `data`, `ordered`, or `parameterization`) passed to the `lavaan` function. See also `lavOptions`.
`ID.fac`	`character`. The method for identifying common-factor variances and (if `meanstructure = TRUE`) means. Three methods are available, which go by different names in the literature: Standardize the common factor (mean = 0, SD = 1) by specifying any of: `"std.lv"`, `"unit.variance"`, `"UV"`, `"fixed.factor"`, `"fixed-factor"` Choose a reference indicator by specifying any of: `"auto.fix.first"`, `"unit.loading"`, `"UL"`, `"marker"`, `"ref"`, `"ref.indicator"`, `"reference.indicator"`, `"reference-indicator"`, `"marker.variable"`, `"marker-variable"` Apply effects-code constraints to loadings and intercepts by specifying any of: `"FX"`, `"EC"`, `"effects"`, `"effects.coding"`, `"effects-coding"`, `"effects.code"`, `"effects-code"` See Kloessner & Klopp (2019) for details about all three methods.
`ID.cat`	`character`. The method for identifying (residual) variances and intercepts of latent item-responses underlying any `ordered` indicators. Four methods are available: To follow Wu & Estabrook's (2016) guidelines (default), specify any of: `"Wu.Estabrook.2016"`, `"Wu.2016"`, `"Wu.Estabrook"`, `"Wu"`, `"Wu2016"`. For consistency, specify `ID.fac = "std.lv"`. To use the default settings of Mplus and `lavaan`, specify any of: `"default"`, `"Mplus"`, `"Muthen"`. Details provided in Millsap & Tein (2004). To use the constraints recommended by Millsap & Tein (2004; see also Liu et al., 2017, for the longitudinal case) specify any of: `"millsap"`, `"millsap.2004"`, `"millsap.tein.2004"`. For consistency, specify `ID.fac = "marker"` and `parameterization = "theta"`. To use the default settings of LISREL, specify `"LISREL"` or `"Joreskog"`. Details provided in Millsap & Tein (2004). For consistency, specify `parameterization = "theta"`. See Details and References for more information.
`ID.thr`	`integer`. Only relevant when `ID.cat = "Millsap.Tein.2004"`. Used to indicate which thresholds should be constrained for identification. The first integer indicates the threshold used for all indicators, the second integer indicates the additional threshold constrained for a reference indicator (ignored if binary).
`group`	optional `character` indicating the name of a grouping variable. See `cfa`.
`group.equal`	optional `character` vector indicating type(s) of parameter to equate across groups. Ignored if `is.null(group)`. See `lavOptions`.
`group.partial`	optional `character` vector or a parameter table indicating exceptions to `group.equal` (see `lavOptions`). Any variables not appearing in the `configural.model` will be ignored, and any parameter constraints needed for identification (e.g., two thresholds per indicator when `ID.cat = "Millsap"`) will be removed.
`longFacNames`	optional named `list` of `character` vectors, each indicating multiple factors in the model that are actually the same construct measured repeatedly. See Details and Examples.
`longIndNames`	optional named `list` of `character` vectors, each indicating multiple indicators in the model that are actually the same indicator measured repeatedly. See Details and Examples.
`long.equal`	optional `character` vector indicating type(s) of parameter to equate across repeated measures. Ignored if no factors are indicated as repeatedly measured in `longFacNames`.
`long.partial`	optional `character` vector or a parameter table indicating exceptions to `long.equal`. Any longitudinal variable names not appearing in `names(longFacNames)` or `names(longIndNames)` will be ignored, and any parameter constraints needed for identification will be removed.
`auto`	Used to automatically included autocorrelated measurement errors among repeatedly measured indicators in `longIndNames`. Specify a single `integer` to set the maximum order (e.g., `auto = 1L` indicates that an indicator's unique factors should only be correlated between adjacently measured occasions). `auto = TRUE` or `"all"` will specify residual covariances among all possible lags per repeatedly measured indicator in `longIndNames`.
`warn`, `debug`	`logical`. Passed to `lavaan` and `lavParseModelString`. See `lavOptions`.
`return.fit`	`logical` indicating whether the generated syntax should be fitted to the provided `data` (or summary statistics, if provided via `sample.cov`). If `configural.model` is a fitted lavaan model, the generated syntax will be fitted using the `update` method (see `lavaan`), and ... will be passed to `lavaan`. If neither data nor a fitted lavaan model were provided, this must be `FALSE`. If `TRUE`, the generated `measEq.syntax` object will be included in the `lavaan` object's `@external` slot, accessible by `fit@external$measEq.syntax`.

Details

This function is a pedagogical and analytical tool to generate model syntax representing some level of measurement equivalence/invariance across any combination of multiple groups and/or repeated measures. Support is provided for confirmatory factor analysis (CFA) models with simple or complex structure (i.e., cross-loadings and correlated residuals are allowed). For any complexities that exceed the limits of automation, this function is intended to still be useful by providing a means to generate syntax that users can easily edit to accommodate their unique situations.

Limited support is provided for bifactor models and higher-order constructs. Because bifactor models have cross-loadings by definition, the option ID.fac = "effects.code" is unavailable. ID.fac = "UV" is recommended for bifactor models, but ID.fac = "UL" is available on the condition that each factor has a unique first indicator in the configural.model. In order to maintain generality, higher-order factors may include a mix of manifest and latent indicators, but they must therefore require ID.fac = "UL" to avoid complications with differentiating lower-order vs. higher-order (or mixed-level) factors. The keyword "loadings" in group.equal or long.equal constrains factor loadings of all manifest indicators (including loadings on higher-order factors that also have latent indicators), whereas the keyword "regressions" constrains factor loadings of latent indicators. Users can edit the model syntax manually to adjust constraints as necessary, or clever use of the group.partial or long.partial arguments could make it possible for users to still automated their model syntax. The keyword "intercepts" constrains the intercepts of all manifest indicators, and the keyword "means" constrains intercepts and means of all latent common factors, regardless of whether they are latent indicators of higher-order factors. To test equivalence of lower-order and higher-order intercepts/means in separate steps, the user can either manually edit their generated syntax or conscientiously exploit the group.partial or long.partial arguments as necessary.

ID.fac: If the configural.model fixes any (e.g., the first) factor loadings, the generated syntax object will retain those fixed values. This allows the user to retain additional constraints that might be necessary (e.g., if there are only 1 or 2 indicators). Some methods must be used in conjunction with other settings:

ID.cat = "Millsap" requires ID.fac = "UL" and parameterization = "theta".
ID.cat = "LISREL" requires parameterization = "theta".
ID.fac = "effects.code" is unavailable when there are any cross-loadings.

ID.cat: Wu & Estabrook (2016) recommended constraining thresholds to equality first, and doing so should allow releasing any identification constraints no longer needed. For each ordered indicator, constraining one threshold to equality will allow the item's intercepts to be estimated in all but the first group or repeated measure. Constraining a second threshold (if applicable) will allow the item's (residual) variance to be estimated in all but the first group or repeated measure. For binary data, there is no independent test of threshold, intercept, or residual-variance equality. Equivalence of thresholds must also be assumed for three-category indicators. These guidelines provide the least restrictive assumptions and tests, and are therefore the default.

The default setting in Mplus is similar to Wu & Estabrook (2016), except that intercepts are always constrained to zero (so they are assumed to be invariant without testing them). Millsap & Tein (2004) recommended parameterization = "theta" and identified an item's residual variance in all but the first group (or occasion; Liu et al., 2017) by constraining its intercept to zero and one of its thresholds to equality. A second threshold for the reference indicator (so ID.fac = "UL") is used to identify the common-factor means in all but the first group/occasion. The LISREL software fixes the first threshold to zero and (if applicable) the second threshold to 1, and assumes any remaining thresholds to be equal across groups / repeated measures; thus, the intercepts are always identified, and residual variances (parameterization = "theta") are identified except for binary data, when they are all fixed to one.

Repeated Measures: If each repeatedly measured factor is measured by the same indicators (specified in the same order in the configural.model) on each occasion, without any cross-loadings, the user can let longIndNames be automatically generated. Generic names for the repeatedly measured indicators are created using the name of the repeatedly measured factors (i.e., names(longFacNames)) and the number of indicators. So the repeatedly measured first indicator ("ind") of a longitudinal construct called "factor" would be generated as "._factor_ind.1".

The same types of parameter can be specified for long.equal as for group.equal (see lavOptions for a list), except for "residual.covariances" or "lv.covariances". Instead, users can constrain autocovariances using keywords "resid.autocov" or "lv.autocov". Note that group.equal = "lv.covariances" or group.equal = "residual.covariances" will constrain any autocovariances across groups, along with any other covariances the user specified in the configural.model. Note also that autocovariances cannot be specified as exceptions in long.partial, so anything more complex than the auto argument automatically provides should instead be manually specified in the configural.model.

When users set orthogonal=TRUE in the configural.model (e.g., in bifactor models of repeatedly measured constructs), autocovariances of each repeatedly measured factor will still be freely estimated in the generated syntax.

Missing Data: If users wish to utilize the auxiliary function to automatically include auxiliary variables in conjunction with missing = "FIML", they should first generate the hypothesized-model syntax, then submit that syntax as the model to auxiliary(). If users utilized runMI to fit their configural.model to multiply imputed data, that model can also be passed to the configural.model argument, and if return.fit = TRUE, the generated model will be fitted to the multiple imputations.

Value

By default, an object of class measEq.syntax. If return.fit = TRUE, a fitted lavaan model, with the measEq.syntax object stored in the @external slot, accessible by fit@external$measEq.syntax.

Author(s)

Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)

References

Kloessner, S., & Klopp, E. (2019). Explaining constraint interaction: How to interpret estimated model parameters under alternative scaling methods. Structural Equation Modeling, 26(1), 143–155. doi:10.1080/10705511.2018.1517356

Liu, Y., Millsap, R. E., West, S. G., Tein, J.-Y., Tanaka, R., & Grimm, K. J. (2017). Testing measurement invariance in longitudinal data with ordered-categorical measures. Psychological Methods, 22(3), 486–506. doi:10.1037/met0000075

Millsap, R. E., & Tein, J.-Y. (2004). Assessing factorial invariance in ordered-categorical measures. Multivariate Behavioral Research, 39(3), 479–515. doi:10.1207/S15327906MBR3903_4

Wu, H., & Estabrook, R. (2016). Identification of confirmatory factor analysis models of different levels of invariance for ordered categorical outcomes. Psychometrika, 81(4), 1014–1045. doi:10.1007/s11336-016-9506-0

Examples

mod.cat <- ' FU1 =~ u1 + u2 + u3 + u4
             FU2 =~ u5 + u6 + u7 + u8 '
## the 2 factors are actually the same factor (FU) measured twice
longFacNames <- list(FU = c("FU1","FU2"))

## CONFIGURAL model: no constraints across groups or repeated measures
syntax.config <- measEq.syntax(configural.model = mod.cat,
                               # NOTE: data provides info about numbers of
                               #       groups and thresholds
                               data = datCat,
                               ordered = paste0("u", 1:8),
                               parameterization = "theta",
                               ID.fac = "std.lv", ID.cat = "Wu.Estabrook.2016",
                               group = "g", longFacNames = longFacNames)
## print lavaan syntax to the Console
cat(as.character(syntax.config))
## print a summary of model features
summary(syntax.config)

## THRESHOLD invariance:
## only necessary to specify thresholds if you have no data
mod.th <- '
  u1 | t1 + t2 + t3 + t4
  u2 | t1 + t2 + t3 + t4
  u3 | t1 + t2 + t3 + t4
  u4 | t1 + t2 + t3 + t4
  u5 | t1 + t2 + t3 + t4
  u6 | t1 + t2 + t3 + t4
  u7 | t1 + t2 + t3 + t4
  u8 | t1 + t2 + t3 + t4
'
syntax.thresh <- measEq.syntax(configural.model = c(mod.cat, mod.th),
                               # NOTE: data not provided, so syntax must
                               #       include thresholds, and number of
                               #       groups == 2 is indicated by:
                               sample.nobs = c(1, 1),
                               parameterization = "theta",
                               ID.fac = "std.lv", ID.cat = "Wu.Estabrook.2016",
                               group = "g", group.equal = "thresholds",
                               longFacNames = longFacNames,
                               long.equal = "thresholds")
## notice that constraining 4 thresholds allows intercepts and residual
## variances to be freely estimated in all but the first group & occasion
cat(as.character(syntax.thresh))
## print a summary of model features
summary(syntax.thresh)


## Fit a model to the data either in a subsequent step (recommended):
mod.config <- as.character(syntax.config)
fit.config <- cfa(mod.config, data = datCat, group = "g",
                  ordered = paste0("u", 1:8), parameterization = "theta")
## or in a single step (not generally recommended):
fit.thresh <- measEq.syntax(configural.model = mod.cat, data = datCat,
                            ordered = paste0("u", 1:8),
                            parameterization = "theta",
                            ID.fac = "std.lv", ID.cat = "Wu.Estabrook.2016",
                            group = "g", group.equal = "thresholds",
                            longFacNames = longFacNames,
                            long.equal = "thresholds", return.fit = TRUE)
## compare their fit to test threshold invariance
anova(fit.config, fit.thresh)


## --------------------------------------------------------
## RECOMMENDED PRACTICE: fit one invariance model at a time
## --------------------------------------------------------

## - A downside of setting return.fit=TRUE is that if the model has trouble
##   converging, you don't have the opportunity to investigate the syntax,
##   or even to know whether an error resulted from the syntax-generator or
##   from lavaan itself.
## - A downside of automatically fitting an entire set of invariance models
##   (like the old measurementInvariance() function did) is that you might
##   end up testing models that shouldn't even be fitted because less
##   restrictive models already fail (e.g., don't test full scalar
##   invariance if metric invariance fails! Establish partial metric
##   invariance first, then test equivalent of intercepts ONLY among the
##   indicators that have invariate loadings.)

## The recommended sequence is to (1) generate and save each syntax object,
## (2) print it to the screen to verify you are fitting the model you expect
## to (and potentially learn which identification constraints should be
## released when equality constraints are imposed), and (3) fit that model
## to the data, as you would if you had written the syntax yourself.

## Continuing from the examples above, after establishing invariance of
## thresholds, we proceed to test equivalence of loadings and intercepts
##   (metric and scalar invariance, respectively)
## simultaneously across groups and repeated measures.

## Not run: 

## metric invariance
syntax.metric <- measEq.syntax(configural.model = mod.cat, data = datCat,
                               ordered = paste0("u", 1:8),
                               parameterization = "theta",
                               ID.fac = "std.lv", ID.cat = "Wu.Estabrook.2016",
                               group = "g", longFacNames = longFacNames,
                               group.equal = c("thresholds","loadings"),
                               long.equal  = c("thresholds","loadings"))
summary(syntax.metric)                    # summarize model features
mod.metric <- as.character(syntax.metric) # save as text
cat(mod.metric)                           # print/view lavaan syntax
## fit model to data
fit.metric <- cfa(mod.metric, data = datCat, group = "g",
                  ordered = paste0("u", 1:8), parameterization = "theta")
## test equivalence of loadings, given equivalence of thresholds
anova(fit.thresh, fit.metric)

## scalar invariance
syntax.scalar <- measEq.syntax(configural.model = mod.cat, data = datCat,
                               ordered = paste0("u", 1:8),
                               parameterization = "theta",
                               ID.fac = "std.lv", ID.cat = "Wu.Estabrook.2016",
                               group = "g", longFacNames = longFacNames,
                               group.equal = c("thresholds","loadings",
                                               "intercepts"),
                               long.equal  = c("thresholds","loadings",
                                               "intercepts"))
summary(syntax.scalar)                    # summarize model features
mod.scalar <- as.character(syntax.scalar) # save as text
cat(mod.scalar)                           # print/view lavaan syntax
## fit model to data
fit.scalar <- cfa(mod.scalar, data = datCat, group = "g",
                  ordered = paste0("u", 1:8), parameterization = "theta")
## test equivalence of intercepts, given equal thresholds & loadings
anova(fit.metric, fit.scalar)


## For a single table with all results, you can pass the models to
## summarize to the compareFit() function
compareFit(fit.config, fit.thresh, fit.metric, fit.scalar)



## ------------------------------------------------------
## NOT RECOMMENDED: fit several invariance models at once
## ------------------------------------------------------
test.seq <- c("thresholds","loadings","intercepts","means","residuals")
meq.list <- list()
for (i in 0:length(test.seq)) {
  if (i == 0L) {
    meq.label <- "configural"
    group.equal <- ""
    long.equal <- ""
  } else {
    meq.label <- test.seq[i]
    group.equal <- test.seq[1:i]
    long.equal <- test.seq[1:i]
  }
  meq.list[[meq.label]] <- measEq.syntax(configural.model = mod.cat,
                                         data = datCat,
                                         ordered = paste0("u", 1:8),
                                         parameterization = "theta",
                                         ID.fac = "std.lv",
                                         ID.cat = "Wu.Estabrook.2016",
                                         group = "g",
                                         group.equal = group.equal,
                                         longFacNames = longFacNames,
                                         long.equal = long.equal,
                                         return.fit = TRUE)
}

compareFit(meq.list)


## -----------------
## Binary indicators
## -----------------

## borrow example data from Mplus user guide
myData <- read.table("http://www.statmodel.com/usersguide/chap5/ex5.16.dat")
names(myData) <- c("u1","u2","u3","u4","u5","u6","x1","x2","x3","g")
bin.mod <- '
  FU1 =~ u1 + u2 + u3
  FU2 =~ u4 + u5 + u6
'
## Must SIMULTANEOUSLY constrain thresholds, loadings, and intercepts
test.seq <- list(strong = c("thresholds","loadings","intercepts"),
                 means = "means",
                 strict = "residuals")
meq.list <- list()
for (i in 0:length(test.seq)) {
  if (i == 0L) {
    meq.label <- "configural"
    group.equal <- ""
    long.equal <- ""
  } else {
    meq.label <- names(test.seq)[i]
    group.equal <- unlist(test.seq[1:i])
    # long.equal <- unlist(test.seq[1:i])
  }
  meq.list[[meq.label]] <- measEq.syntax(configural.model = bin.mod,
                                         data = myData,
                                         ordered = paste0("u", 1:6),
                                         parameterization = "theta",
                                         ID.fac = "std.lv",
                                         ID.cat = "Wu.Estabrook.2016",
                                         group = "g",
                                         group.equal = group.equal,
                                         #longFacNames = longFacNames,
                                         #long.equal = long.equal,
                                         return.fit = TRUE)
}

compareFit(meq.list)


## ---------------------
## Multilevel Invariance
## ---------------------

## To test invariance across levels in a MLSEM, specify syntax as though
## you are fitting to 2 groups instead of 2 levels.

mlsem <- ' f1 =~ y1 + y2 + y3
           f2 =~ y4 + y5 + y6 '
## metric invariance
syntax.metric <- measEq.syntax(configural.model = mlsem, meanstructure = TRUE,
                               ID.fac = "std.lv", sample.nobs = c(1, 1),
                               group = "cluster", group.equal = "loadings")
## by definition, Level-1 means must be zero, so fix them
syntax.metric <- update(syntax.metric,
                        change.syntax = paste0("y", 1:6, " ~ c(0, NA)*1"))
## save as a character string
mod.metric <- as.character(syntax.metric, groups.as.blocks = TRUE)
## convert from multigroup to multilevel
mod.metric <- gsub(pattern = "group:", replacement = "level:",
                   x = mod.metric, fixed = TRUE)
## fit model to data
fit.metric <- lavaan(mod.metric, data = Demo.twolevel, cluster = "cluster")
summary(fit.metric)

## End(Not run)

[Package semTools version 0.5-6 Index]