lavaan_rerun {semfindr}R Documentation

Rerun a 'lavaan' Analysis Using the Leaving-One-Out Approach

Description

Reruns a lavaan analysis several times, each time with one case removed.

Usage

lavaan_rerun(
  fit,
  case_id = NULL,
  to_rerun,
  md_top,
  resid_md_top,
  allow_inadmissible = FALSE,
  skip_all_checks = FALSE,
  parallel = FALSE,
  makeCluster_args = list(spec = getOption("cl.cores", 2)),
  rerun_method = c("lavaan", "update")
)

Arguments

fit

The output from lavaan::lavaan() or its wrappers (e.g., lavaan::cfa() and lavaan::sem()).

case_id

If it is a character vector of length equals to the number of cases (the number of rows in the data in fit), then it is the vector of case identification values. If it is NULL, the default, then case.idx used by lavaan functions will be used as case identification values. The case identification values will be used to name the list of n output.

to_rerun

The cases to be processed. If case_id is specified, this should be a subset of case_id. If case_id is not specified, then this should be a vector of integers indicating the rows to te processed, as appeared in the data in fit. to_rerun cannot be used together with md_top or resid_md_top.

md_top

The number of cases to be processed based on the Mahalanobis distance computed on all observed variables used in the model. The cases will be ranked from the largest to the smallest distance, and the top md_top case(s) will be processed. md_top cannot be used together with to_rerun or resid_md_top.

resid_md_top

The number of cases to be processed based on the Mahalanobis distance computed from the residuals of outcome variables. The cases will be ranked from the largest to the smallest distance, and the top resid_md_top case(s) will be processed. resid_md_top cannot be used together with to_rerun or md_top.

allow_inadmissible

If TRUE, accepts a fit object with inadmissible results (i.e., post.check from lavaan::lavInspect() is FALSE). Default is FALSE.

skip_all_checks

If TRUE, skips all checks and allow users to run this function on any object of lavaan class. For users to experiment this and other functions on models not officially supported. Default is FALSE.

parallel

Whether parallel will be used. If TRUE, will use functions in the parallel package to rerun the analysis. Currently, only support "snow" type clusters using local CPU cores. Default is FALSE.

makeCluster_args

A named list of arguments to be passed to parallel::makeCluster(). Default is ⁠list(spec = getOption("cl.cores", 2)))⁠. If only the number of cores need to be specified, use list(spec = x), where x is the number of cores to use.

rerun_method

How fit will be rerun. Default is "lavaan". An alternative method is "update". For internal use. If "lavaan" returns an error, try setting this argument to "update".

Details

lavaan_rerun() gets an lavaan::lavaan() output and reruns the analysis n0 times, using the same arguments and options in the output, n0 equals to the number of cases selected, by default all cases in the analysis. In each run, one case will be removed.

Optionally, users can rerun the analysis with only selected cases removed. These cases can be specified by case IDs, by Mahalanobis distance computed from all variables used in the model, or by Mahalanobis distance computed from the residuals (observed score - implied scores) of observed outcome variables. See the help on the arguments to_rerun, md_top, and resid_md_top.

It is not recommended to use Mahalanobis distance computed from all variables, especially for models with observed variables as predictors (Pek & MacCallum, 2011). Cases that are extreme on predictors may not be influential on the parameter estimates. Nevertheless, this distance is reported in some SEM programs and so this option is provided.

Mahalanobis distance based on residuals are supported for models with no latent factors. The implied scores are computed by implied_scores().

If the sample size is large, it is recommended to use parallel processing. However, it is possible that parallel processing will fail. If this is the case, try to use serial processing, by simply removing the argument parallel or set it to FALSE.

Many other functions in semfindr use the output from lavaan_rerun(). Instead of running the n analyses every time, do this step once and then users can compute whatever influence statistics they want quickly.

If the analysis took a few minutes to run due to the large number of cases or the long processing time in fitting the model, it is recommended to save the output to an external file (e.g., by base::saveRDS()).

Supports both single-group and multiple-group models. (Support for multiple-group models available in 0.1.4.8 and later version).

Value

A lavaan_rerun-class object, which is a list with the following elements:

Author(s)

Shu Fai Cheung https://orcid.org/0000-0002-9871-9448.

Examples

library(lavaan)
dat <- pa_dat
# For illustration, select only the first 50 cases
dat <- dat[1:50, ]
# The model
mod <-
"
m1 ~ iv1 + iv2
dv ~ m1
"
# Fit the model
fit <- lavaan::sem(mod, dat)
summary(fit)

# Fit the model n times. Each time with one case removed.
fit_rerun <- lavaan_rerun(fit, parallel = FALSE)

# Print the output for a brief description of the runs
fit_rerun

# Results excluding the first case
fitMeasures(fit_rerun$rerun[[1]], c("chisq", "cfi", "tli", "rmsea"))
# Results by manually excluding the first case
fit_01 <- lavaan::sem(mod, dat[-1, ])
fitMeasures(fit_01, c("chisq", "cfi", "tli", "rmsea"))


[Package semfindr version 0.1.8 Index]