R: Error Loop to Correct Final Correlation of Simulated...

error_loop {SimMultiCorrData}

R Documentation

Error Loop to Correct Final Correlation of Simulated Variables

Description

This function corrects the final correlation of simulated variables to be within a precision value (epsilon) of the target correlation. It updates the pairwise intermediate MVN correlation iteratively in a loop until either the maximum error is less than epsilon or the number of iterations exceeds the maximum number set by the user (maxit). It uses error_vars to simulate all variables and calculate the correlation of all variables in each iteration. This function would not ordinarily be called directly by the user. The function is a modification of Barbiero & Ferrari's ordcont function in GenOrd-package. The ordcont has been modified in the following ways:

1) It works for continuous, ordinal (r >= 2 categories), and count variables.

2) The initial correlation check has been removed because this intermediate correlation Sigma from rcorrvar or rcorrvar2 has already been checked for positive-definiteness and used to generate variables.

3) Eigenvalue decomposition is done on Sigma to impose the correct interemdiate correlations on the normal variables. If Sigma is not positive-definite, the negative eigen values are replaced with 0.

4) The final positive-definite check has been removed.

5) The intermediate correlation update function was changed to accommodate more situations.

6) A final "fail-safe" check was added at the end of the iteration loop where if the absolute error between the final and target pairwise correlation is still > 0.1, the intermediate correlation is set equal to the target correlation (if extra_correct = "TRUE").

7) Allowing specifications for the sample size and the seed for reproducibility.

Usage

error_loop(k_cat, k_cont, k_pois, k_nb, Y_cat, Y, Yb, Y_pois, Y_nb, marginal,
  support, method, means, vars, constants, lam, size, prob, mu, n, seed,
  epsilon, maxit, rho0, Sigma, rho_calc, extra_correct)

Arguments

`k_cat`	the number of ordinal (r >= 2 categories) variables
`k_cont`	the number of continuous variables
`k_pois`	the number of Poisson variables
`k_nb`	the number of Negative Binomial variables
`Y_cat`	the ordinal variables generated from `rcorrvar` or `rcorrvar2`
`Y`	the continuous (mean 0, variance 1) variables
`Yb`	the continuous variables with desired mean and variance
`Y_pois`	the Poisson variables
`Y_nb`	the Negative Binomial variables
`marginal`	a list of length equal `k_cat`; the i-th element is a vector of the cumulative probabilities defining the marginal distribution of the i-th variable; if the variable can take r values, the vector will contain r - 1 probabilities (the r-th is assumed to be 1)
`support`	a list of length equal `k_cat`; the i-th element is a vector of containing the r ordered support values; if not provided, the default is for the i-th element to be the vector 1, ..., r
`method`	the method used to generate the continuous variables. "Fleishman" uses a third-order polynomial transformation and "Polynomial" uses Headrick's fifth-order transformation.
`means`	a vector of means for the continuous variables
`vars`	a vector of variances
`constants`	a matrix with `k_cont` rows, each a vector of constants c0, c1, c2, c3 (if `method` = "Fleishman") or c0, c1, c2, c3, c4, c5 (if `method` = "Polynomial"), like that returned by `find_constants`
`lam`	a vector of lambda (> 0) constants for the Poisson variables (see `Poisson`)
`size`	a vector of size parameters for the Negative Binomial variables (see `NegBinomial`)
`prob`	a vector of success probability parameters
`mu`	a vector of mean parameters (*Note: either `prob` or `mu` should be supplied for all Negative Binomial variables, not a mixture)
`n`	the sample size
`seed`	the seed value for random number generation
`epsilon`	the maximum acceptable error between the final and target correlation matrices; smaller epsilons take more time
`maxit`	the maximum number of iterations to use to find the intermediate correlation; the correction loop stops when either the iteration number passes `maxit` or `epsilon` is reached
`rho0`	the target correlation matrix
`Sigma`	the intermediate correlation matrix previously used in `rcorrvar` or `rcorrvar2`
`rho_calc`	the final correlation matrix calculated in `rcorrvar` or `rcorrvar2`
`extra_correct`	if "TRUE", a final "fail-safe" check is used at the end of the iteration loop where if the absolute error between the final and target pairwise correlation is still > 0.1, the intermediate correlation is set equal to the target correlation

Value

A list with the following components:

Sigma the intermediate MVN correlation matrix resulting from the error loop

rho_calc the calculated final correlation matrix generated from Sigma

Y_cat the ordinal variables

Y the continuous (mean 0, variance 1) variables

Yb the continuous variables with desired mean and variance

Y_pois the Poisson variables

Y_nb the Negative Binomial variables

niter a matrix containing the number of iterations required for each variable pair

References

Barbiero A, Ferrari PA (2015). GenOrd: Simulation of Discrete Random Variables with Given Correlation Matrix and Marginal Distributions. R package version 1.4.0. https://CRAN.R-project.org/package=GenOrd

Ferrari PA, Barbiero A (2012). Simulating ordinal data. Multivariate Behavioral Research, 47(4): 566-589. doi: 10.1080/00273171.2012.692630.

Fleishman AI (1978). A Method for Simulating Non-normal Distributions. Psychometrika, 43, 521-532. doi: 10.1007/BF02293811.

Headrick TC (2002). Fast Fifth-order Polynomial Transforms for Generating Univariate and Multivariate Non-normal Distributions. Computational Statistics & Data Analysis, 40(4):685-711. doi: 10.1016/S0167-9473(02)00072-5. (ScienceDirect)

Headrick TC (2004). On Polynomial Transformations for Simulating Multivariate Nonnormal Distributions. Journal of Modern Applied Statistical Methods, 3(1), 65-71. doi: 10.22237/jmasm/1083370080.

Headrick TC, Kowalchuk RK (2007). The Power Method Transformation: Its Probability Density Function, Distribution Function, and Its Further Use for Fitting Data. Journal of Statistical Computation and Simulation, 77, 229-249. doi: 10.1080/10629360600605065.

Headrick TC, Sawilowsky SS (1999). Simulating Correlated Non-normal Distributions: Extending the Fleishman Power Method. Psychometrika, 64, 25-35. doi: 10.1007/BF02294317.

Headrick TC, Sheng Y, & Hodis FA (2007). Numerical Computing and Graphics for the Power Method Transformation Using Mathematica. Journal of Statistical Software, 19(3), 1 - 17. doi: 10.18637/jss.v019.i03.