R: Comparison Data

CD {EFAtools}

R Documentation

Comparison Data

Description

Factor retention method introduced by Ruscio and Roche (2012). The code was adapted from the CD code by Auerswald and Moshagen (2017) available at https://osf.io/x5cz2/?view_only=d03efba1fd0f4c849a87db82e6705668

Usage

CD(
  x,
  n_factors_max = NA,
  N_pop = 10000,
  N_samples = 500,
  alpha = 0.3,
  use = c("pairwise.complete.obs", "all.obs", "complete.obs", "everything",
    "na.or.complete"),
  cor_method = c("pearson", "spearman", "kendall"),
  max_iter = 50
)

Arguments

`x`	data.frame or matrix. Dataframe or matrix of raw data.
`n_factors_max`	numeric. The maximum number of factors to test against. Larger numbers will increase the duration the procedure takes, but test more possible solutions. If left NA (default) the maximum number of factors for which the model is still over-identified (df > 0) is used.
`N_pop`	numeric. Size of finite populations of comparison data. Default is 10000.
`N_samples`	numeric. Number of samples drawn from each population. Default is 500.
`alpha`	numeric. The alpha level used to test the significance of the improvement added by an additional factor. Default is .30.
`use`	character. Passed to `stats::cor`. Default is "pairwise.complete.obs". However, for the comparison data procedure, `NA` values will be excluded using na.omit(). If missing data should be handled differently (e.g., imputation), do this before passing the data to `CD()`.
`cor_method`	character. Passed to `stats::cor`. Default is "pearson".
`max_iter`	numeric. The maximum number of iterations to perform after which the iterative PAF procedure is halted. Default is 50.

Details

"Parallel analysis (PA) is an effective stopping rule that compares the eigenvalues of randomly generated data with those for the actual data. PA takes into account sampling error, and at present it is widely considered the best available method. We introduce a variant of PA that goes even further by reproducing the observed correlation matrix rather than generating random data. Comparison data (CD) with known factorial structure are first generated using 1 factor, and then the number of factors is increased until the reproduction of the observed eigenvalues fails to improve significantly" (Ruscio & Roche, 2012, p. 282).

The CD implementation here is based on the code by Ruscio and Roche (2012), but is slightly adapted to increase speed by performing the principal axis factoring using a C++ based function.

Note that if the data contains missing values, these will be removed for the comparison data procedure using stats::na.omit. If missing data should be treated differently, e.g., by imputation, do this outside CD and then pass the complete data.

The CD function can also be called together with other factor retention criteria in the N_FACTORS function.

Value

A list of class CD containing

`n_factors`	The number of factors to retain according to comparison data results.
`eigenvalues`	A vector containing the eigenvalues of the entered data.
`RMSE_eigenvalues`	A matrix containing the RMSEs between the eigenvalues of the generated data and those of the entered data.
`settings`	A list of the settings used.

Source

Auerswald, M., & Moshagen, M. (2019). How to determine the number of factors to retain in exploratory factor analysis: A comparison of extraction methods under realistic conditions. Psychological Methods, 24(4), 468–491. https://doi.org/10.1037/met0000200

Ruscio, J., & Roche, B. (2012). Determining the number of factors to retain in an exploratory factor analysis using comparison data of known factorial structure. Psychological Assessment, 24, 282–292. doi: 10.1037/a0025697

Examples


# determine n factors of the GRiPS
CD(GRiPS_raw)

# determine n factors of the DOSPERT risk subscale
CD(DOSPERT_raw)

[Package EFAtools version 0.4.4 Index]