SimulateCVR {CVR}R Documentation

Generate simulation data.

Description

Generate two sets of covariates and an univariate response driven by several latent factors.

Usage

SimulateCVR(family = c("gaussian", "binomial", "poisson"), n = 100,  
         rank = 4, p1 = 50, p2 = 70, pnz = 10, sigmax = 0.2,   
         sigmay = 0.5, beta = c(2, 1, 0, 0), standardization = TRUE)

Arguments

family

Type of response. "gaussian" for continuous response, "binomial" for binary response, and "poisson" for Poisson response. The default is "gaussian".

n

Number of rows. The default is 100.

rank

Number of latent factors generating the covariates. The default is 4.

p1

Number of variables in X1. The default is 50.

p2

Number of variables in X2. The default is 70.

pnz

Number of variables in X1 and X2 related to the signal. The default is 10.

sigmax

Standard deviation of normal noise in X1 and X2. The default is 0.2.

sigmay

Standard deviation of normal noise in Y. Only used when the response is Gaussian. The default is 0.5.

beta

Numeric vector, the coefficients used to generate respose from the latent factors. The default is c(2, 1, 0, 0).

standardization

Logical. If TRUE, standardize X1 and X2 before output. The default is TRUE.

Details

The latent factors in U are randomly generated normal vectors,

X1 = U*V1 + sigmax*E1, X2 = U*V2 + sigmax*E2, E1, E2 are N(0,1) noise matrices.

The nonzero entries of V1 and V2 are generated from Uniform([-1,-0.5]U[0.5,1]).

For Gaussian response,

y = U*β + sigmay*ey, ey is N(0,1) noise vector,

for binary response,

y \sim rbinom(n, 1, 1/(1 + \exp(-U*β))),

and for Poisson response,

y \sim rpois(n, \exp(U*β)).

See the reference for more details.

Value

X1, X2

The two sets of covariates with dimensions n*p1 and n*p2 respectively.

y

The response vector with length n.

U

The true latent factor matrix with dimension n*rank.

beta

The coefficients used to generate response from U. The length is rank.

V1, V2

The true loading matrices for X1 and X2 with dimensions p1*rank and p2*rank. The first pnz rows are nonzero.

Author(s)

Chongliang Luo, Kun Chen.

References

Chongliang Luo, Jin Liu, Dipak D. Dey and Kun Chen (2016) Canonical variate regression. Biostatistics, doi: 10.1093/biostatistics/kxw001.

See Also

CVR, cvrsolver.

Examples

 set.seed(42)
 mydata <- SimulateCVR(family = "g", n = 100, rank = 4, p1 = 50, p2 = 70, 
                   pnz = 10, beta = c(2, 1, 0, 0))
 X1 <- mydata$X1
 X2 <- mydata$X2
 Xlist <- list(X1 = X1, X2 = X2); 
 Y <- mydata$y
 opts <- list(standardization = FALSE, maxIters = 300, tol = 0.005)
 ## use sparse CCA solution as initial values, see SparseCCA()
 Wini <- SparseCCA(X1, X2, 4, 0.7, 0.7) 
 ## perform CVR with fixed eta and lambda, see cvrsolver()
 fit <- cvrsolver(Y, Xlist, rank = 4, eta = 0.5, Lam = c(1, 1), 
                 family = "gaussian", Wini, penalty = "GL1", opts)
 ## check sparsity recovery
 fit$W[[1]]; 
 fit$W[[2]];
 ## check orthogonality
 X1W1 <- X1 %*% fit$W[[1]]; 
 t(X1W1) %*% X1W1

[Package CVR version 0.1.1 Index]