R: Generate simulation data.

SimulateCVR {CVR}

R Documentation

Generate simulation data.

Description

Generate two sets of covariates and an univariate response driven by several latent factors.

Usage

SimulateCVR(family = c("gaussian", "binomial", "poisson"), n = 100,  
         rank = 4, p1 = 50, p2 = 70, pnz = 10, sigmax = 0.2,   
         sigmay = 0.5, beta = c(2, 1, 0, 0), standardization = TRUE)

Arguments

`family`	Type of response. `"gaussian"` for continuous response, `"binomial"` for binary response, and `"poisson"` for Poisson response. The default is `"gaussian"`.
`n`	Number of rows. The default is 100.
`rank`	Number of latent factors generating the covariates. The default is 4.
`p1`	Number of variables in X1. The default is 50.
`p2`	Number of variables in X2. The default is 70.
`pnz`	Number of variables in X1 and X2 related to the signal. The default is 10.
`sigmax`	Standard deviation of normal noise in X1 and X2. The default is 0.2.
`sigmay`	Standard deviation of normal noise in Y. Only used when the response is Gaussian. The default is 0.5.
`beta`	Numeric vector, the coefficients used to generate respose from the latent factors. The default is c(2, 1, 0, 0).
`standardization`	Logical. If TRUE, standardize X1 and X2 before output. The default is TRUE.

Details

The latent factors in U are randomly generated normal vectors,

X_1 = U*V_1 + \sigma_x*E_1, X_2 = U*V_2 + \sigma_x*E_2, E_1, E_2 are N(0,1) noise matrices.

The nonzero entries of V_1 and V_2 are generated from Uniform([-1,-0.5]U[0.5,1]).

For Gaussian response,

y = U*\beta + \sigma_y*e_y, e_y is N(0,1) noise vector,

for binary response,

y \sim rbinom(n, 1, 1/(1 + \exp(-U*\beta))),

and for Poisson response,

y \sim rpois(n, \exp(U*\beta)).

See the reference for more details.

Value

`X1`, `X2`	The two sets of covariates with dimensions np1 and np2 respectively.
`y`	The response vector with length n.
`U`	The true latent factor matrix with dimension n*rank.
`beta`	The coefficients used to generate response from `U`. The length is rank.
`V1`, `V2`	The true loading matrices for X1 and X2 with dimensions p1rank and p2rank. The first `pnz` rows are nonzero.

Author(s)

Chongliang Luo, Kun Chen.

References

Chongliang Luo, Jin Liu, Dipak D. Dey and Kun Chen (2016) Canonical variate regression. Biostatistics, doi: 10.1093/biostatistics/kxw001.

Examples

 set.seed(42)
 mydata <- SimulateCVR(family = "g", n = 100, rank = 4, p1 = 50, p2 = 70, 
                   pnz = 10, beta = c(2, 1, 0, 0))
 X1 <- mydata$X1
 X2 <- mydata$X2
 Xlist <- list(X1 = X1, X2 = X2); 
 Y <- mydata$y
 opts <- list(standardization = FALSE, maxIters = 300, tol = 0.005)
 ## use sparse CCA solution as initial values, see SparseCCA()
 Wini <- SparseCCA(X1, X2, 4, 0.7, 0.7) 
 ## perform CVR with fixed eta and lambda, see cvrsolver()
 fit <- cvrsolver(Y, Xlist, rank = 4, eta = 0.5, Lam = c(1, 1), 
                 family = "gaussian", Wini, penalty = "GL1", opts)
 ## check sparsity recovery
 fit$W[[1]]; 
 fit$W[[2]];
 ## check orthogonality
 X1W1 <- X1 %*% fit$W[[1]]; 
 t(X1W1) %*% X1W1

[Package CVR version 0.1.1 Index]