R: Error Loop to Correct Final Correlation of Simulated...

corr_error {SimCorrMix}

R Documentation

Error Loop to Correct Final Correlation of Simulated Variables

Description

This function attempts to correct the final pairwise correlations of simulated variables to be within epsilon of the target correlations. It updates the intermediate normal correlation iteratively in a loop until either the maximum error is less than epsilon or the number of iterations exceeds maxit. This function would not ordinarily be called directly by the user. The function is a modification of Barbiero & Ferrari's ordcont function in GenOrd-package. The ordcont function has been modified in the following ways:

1) It works for continuous, ordinal (r >= 2 categories), and count (regular or zero-inflated, Poisson or Negative Binomial) variables.

2) The initial correlation check has been removed because the intermediate correlation matrix Sigma from corrvar or corrvar2 has already been checked for positive-definiteness and used to generate variables.

3) Eigenvalue decomposition is done on Sigma to impose the correct intermediate correlations on the normal variables. If Sigma is not positive-definite, the negative eigenvalues are replaced with 0.

4) The final positive-definite check has been removed.

5) The intermediate correlation update function was changed to accommodate more situations.

6) Allowing specifications for the sample size and the seed for reproducibility.

The vignette Variable Types describes the algorithm used in the error loop.

Usage

corr_error(n = 10000, k_cat = 0, k_cont = 0, k_pois = 0, k_nb = 0,
  method = c("Fleishman", "Polynomial"), means = NULL, vars = NULL,
  constants = NULL, marginal = list(), support = list(), lam = NULL,
  p_zip = 0, size = NULL, mu = NULL, p_zinb = 0, seed = 1234,
  epsilon = 0.001, maxit = 1000, rho0 = NULL, Sigma = NULL,
  rho_calc = NULL)

Arguments

`n`	the sample size
`k_cat`	the number of ordinal (r >= 2 categories) variables
`k_cont`	the number of continuous variables (these may be regular continuous variables or components of continuous mixture variables)
`k_pois`	the number of Poisson (regular or zero-inflated) variables
`k_nb`	the number of Negative Binomial (regular or zero-inflated) variables
`method`	the method used to generate the continuous variables. "Fleishman" uses a third-order polynomial transformation and "Polynomial" uses Headrick's fifth-order transformation.
`means`	a vector of means for the continuous variables
`vars`	a vector of variances for the continuous variables
`constants`	a matrix with `k_cont` rows, each a vector of constants c0, c1, c2, c3 (if `method` = "Fleishman") or c0, c1, c2, c3, c4, c5 (if `method` = "Polynomial"), like that returned by `find_constants`
`marginal`	a list of length equal `k_cat`; the i-th element is a vector of the cumulative probabilities defining the marginal distribution of the i-th variable; if the variable can take r values, the vector will contain r - 1 probabilities (the r-th is assumed to be 1)
`support`	a list of length equal `k_cat`; the i-th element is a vector of containing the r ordered support values; if not provided, the default is for the i-th element to be the vector 1, ..., r
`lam`	a vector of lambda (mean > 0) constants for the Poisson variables (see `stats::dpois`); the order should be 1st regular Poisson variables, 2nd zero-inflated Poisson variables
`p_zip`	a vector of probabilities of structural zeros (not including zeros from the Poisson distribution) for the zero-inflated Poisson variables (see `VGAM::dzipois`)
`size`	a vector of size parameters for the Negative Binomial variables (see `stats::dnbinom`); the order should be 1st regular NB variables, 2nd zero-inflated NB variables
`mu`	a vector of mean parameters for the NB variables; order the same as in `size`; for zero-inflated NB this refers to the mean of the NB distribution (see `VGAM::dzinegbin`)
`p_zinb`	a vector of probabilities of structural zeros (not including zeros from the NB distribution) for the zero-inflated NB variables (see `VGAM::dzinegbin`)
`seed`	the seed value for random number generation
`epsilon`	the maximum acceptable error between the final and target pairwise correlation; smaller epsilons take more time
`maxit`	the maximum number of iterations to use to find the intermediate correlation; the correction loop stops when either the iteration number passes `maxit` or `epsilon` is reached
`rho0`	the target correlation matrix
`Sigma`	the intermediate correlation matrix previously used in `corrvar` or `corrvar2`
`rho_calc`	the final correlation matrix calculated in `corrvar` or `corrvar2` before execution of `corr_error`

Value

A list with the following components:

Sigma the intermediate MVN correlation matrix resulting from the error loop

rho_calc the calculated final correlation matrix generated from Sigma

Y_cat the ordinal variables

Y the continuous (mean 0, variance 1) variables

Y_cont the continuous variables with desired mean and variance

Y_pois the Poisson variables

Y_nb the Negative Binomial variables

niter a matrix containing the number of iterations required for each variable pair

References