R: Parcel-Allocation Variability in Model Ranking

PAVranking {semTools}

R Documentation

Parcel-Allocation Variability in Model Ranking

Description

This function quantifies and assesses the consequences of parcel-allocation variability for model ranking of structural equation models (SEMs) that differ in their structural specification but share the same parcel-level measurement specification (see Sterba & Rights, 2016). This function calls parcelAllocation—which can be used with only one SEM in isolation—to fit two (assumed) nested models to each of a specified number of random item-to-parcel allocations. Output includes summary information about the distribution of model selection results (including plots) and the distribution of results for each model individually, across allocations within-sample. Note that this function can be used when selecting among more than two competing structural models as well (see instructions below involving the seed argument).

Usage

PAVranking(model0, model1, data, parcel.names, item.syntax, nAlloc = 100,
  fun = "sem", alpha = 0.05, bic.crit = 10, fit.measures = c("chisq",
  "df", "cfi", "tli", "rmsea", "srmr", "logl", "aic", "bic", "bic2"), ...,
  show.progress = FALSE, iseed = 12345, warn = FALSE)

Arguments

`model0`, `model1`	`lavaan` model syntax specifying nested models (`model0` within `model1`) to be fitted to the same parceled data. Note that there can be a mixture of items and parcels (even within the same factor), in case certain items should never be parceled. Can be a character string or parameter table. Also see `lavaanify` for more details.
`data`	A `data.frame` containing all observed variables appearing in the `model`, as well as those in the `item.syntax` used to create parcels. If the data have missing values, multiple imputation before parceling is recommended: submit a stacked data set (with a variable for the imputation number, so they can be separateed later) and set `do.fit = FALSE` to return the list of `data.frame`s (one per allocation), each of which is a stacked, imputed data set with parcels.
`parcel.names`	`character` vector containing names of all parcels appearing as indicators in `model`.
`item.syntax`	lavaan model syntax specifying the model that would be fit to all of the unparceled items, including items that should be randomly allocated to parcels appearing in `model`.
`nAlloc`	The number of random items-to-parcels allocations to generate.
`fun`	`character` string indicating the name of the `lavaan` function used to fit `model` to `data`. Can only take the values `"lavaan"`, `"sem"`, `"cfa"`, or `"growth"`.
`alpha`	Alpha level used as criterion for significance.
`bic.crit`	Criterion for assessing evidence in favor of one model over another. See Rafferty (1995) for guidelines (default is "very strong evidence" in favor of the model with lower BIC).
`fit.measures`	`character` vector containing names of fit measures to request from each fitted `lavaan` model. See the output of `fitMeasures` for a list of available measures.
`...`	Additional arguments to be passed to `lavaanList`. See also `lavOptions`
`show.progress`	If `TRUE`, show a `txtProgressBar` indicating how fast each model-fitting iterates over allocations.
`iseed`	(Optional) Random seed used for parceling items. When the same random seed is specified and the program is re-run, the same allocations will be generated. The seed argument can be used to assess parcel-allocation variability in model ranking when considering more than two models. For each pair of models under comparison, the program should be rerun using the same random seed. Doing so ensures that multiple model comparisons will employ the same set of parcel datasets. Note: When using parallel options, you must first type `RNGkind("L'Ecuyer-CMRG")` into the R Console, so that the seed will be controlled across cores.
`warn`	Whether to print warnings when fitting models to each allocation

Details

This is based on a SAS macro ParcelAlloc (Sterba & MacCallum, 2010). The PAVranking function produces results discussed in Sterba and Rights (2016) relevant to the assessment of parcel-allocation variability in model selection and model ranking. Specifically, the PAVranking function first calls parcelAllocation to generate a given number (nAlloc) of item-to-parcel allocations, fitting both specified models to each allocation, and providing summaryies of PAV for each model. Additionally, PAVranking provides the following new summaries:

PAV in model selection index values and model ranking between Models model0 and model1.
The proportion of allocations that converged and the proportion of proper solutions (results are summarized for allocations with both converged and proper allocations only).

For further details on the benefits of the random allocation of items to parcels, see Sterba (2011) and Sterba and MacCallum (2010).

To test whether nested models have equivalent fit, results can be pooled across allocations using the same methods available for pooling results across multiple imputations of missing data (see Examples).

Note: This function requires the lavaan package. Missing data must be coded as NA. If the function returns "Error in plot.new() : figure margins too large", the user may need to increase size of the plot window (e.g., in RStudio) and rerun the function.

Value

model0.results

Results returned by parcelAllocation for model0 (see the Value section).

model1.results

Results returned by parcelAllocation for model1 (see the Value section).

model0.v.model1

A list of model-comparison results, including the following:

LRT_Summary: The average likelihood ratio test across allocations, as well as the SD, minimum, maximum, range, and the proportion of allocations for which the test was significant.
Fit_Index_Differences: Differences in fit indices, organized by what proportion favored each model and among those, what the average difference was.
Favored_by_BIC: The proportion of allocations in which each model met the criterion (bic.crit) for a substantial difference in fit.
Convergence_Summary: The proportion of allocations in which each model (and both models) converged on a solution.

Histograms are also printed to the current plot-output device.

Author(s)

Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)

References

Raftery, A. E. (1995). Bayesian model selection in social research. Sociological Methodology, 25, 111–163. doi:10.2307/271063

Sterba, S. K. (2011). Implications of parcel-allocation variability for comparing fit of item-solutions and parcel-solutions. Structural Equation Modeling, 18(4), 554–577.doi:10.1080/10705511.2011.607073

Sterba, S. K., & MacCallum, R. C. (2010). Variability in parameter estimates and model fit across repeated allocations of items to parcels. Multivariate Behavioral Research, 45(2), 322–358. doi:10.1080/00273171003680302

Sterba, S. K., & Rights, J. D. (2016). Accounting for parcel-allocation variability in practice: Combining sources of uncertainty and choosing the number of allocations. Multivariate Behavioral Research, 51(2–3), 296–313. doi:10.1080/00273171.2016.1144502

Sterba, S. K., & Rights, J. D. (2017). Effects of parceling on model selection: Parcel-allocation variability in model ranking. Psychological Methods, 22(1), 47–68. doi:10.1037/met0000067

Examples


## Specify the item-level model (if NO parcels were created)
## This must apply to BOTH competing models

item.syntax <- c(paste0("f1 =~ f1item", 1:9),
                 paste0("f2 =~ f2item", 1:9))
cat(item.syntax, sep = "\n")
## Below, we reduce the size of this same model by
## applying different parceling schemes

## Specify a 2-factor CFA with correlated factors, using 3-indicator parcels
mod1 <- '
f1 =~ par1 + par2 + par3
f2 =~ par4 + par5 + par6
'
## Specify a more restricted model with orthogonal factors
mod0 <- '
f1 =~ par1 + par2 + par3
f2 =~ par4 + par5 + par6
f1 ~~ 0*f2
'
## names of parcels (must apply to BOTH models)
(parcel.names <- paste0("par", 1:6))

## Not run: 
## override default random-number generator to use parallel options
RNGkind("L'Ecuyer-CMRG")

PAVranking(model0 = mod0, model1 = mod1, data = simParcel, nAlloc = 100,
           parcel.names = parcel.names, item.syntax = item.syntax,
           std.lv = TRUE,       # any addition lavaan arguments
           parallel = "snow")   # parallel options



## POOL RESULTS by treating parcel allocations as multiple imputations.
## Details provided in Sterba & Rights (2016); see ?poolMAlloc.

## save list of data sets instead of fitting model yet
dataList <- parcelAllocation(mod.parcels, data = simParcel, nAlloc = 100,
                             parcel.names = parcel.names,
                             item.syntax = item.syntax,
                             do.fit = FALSE)
## now fit each model to each data set
fit0 <- cfa.mi(mod0, data = dataList, std.lv = TRUE)
fit1 <- cfa.mi(mod1, data = dataList, std.lv = TRUE)
anova(fit0, fit1)   # pooled test statistic comparing models
class?lavaan.mi     # find more methods for pooling results

## End(Not run)

[Package semTools version 0.5-6 Index]