SimulateCVR {CVR} | R Documentation |
Generate simulation data.
Description
Generate two sets of covariates and an univariate response driven by several latent factors.
Usage
SimulateCVR(family = c("gaussian", "binomial", "poisson"), n = 100,
rank = 4, p1 = 50, p2 = 70, pnz = 10, sigmax = 0.2,
sigmay = 0.5, beta = c(2, 1, 0, 0), standardization = TRUE)
Arguments
family |
Type of response. |
n |
Number of rows. The default is 100. |
rank |
Number of latent factors generating the covariates. The default is 4. |
p1 |
Number of variables in X1. The default is 50. |
p2 |
Number of variables in X2. The default is 70. |
pnz |
Number of variables in X1 and X2 related to the signal. The default is 10. |
sigmax |
Standard deviation of normal noise in X1 and X2. The default is 0.2. |
sigmay |
Standard deviation of normal noise in Y. Only used when the response is Gaussian. The default is 0.5. |
beta |
Numeric vector, the coefficients used to generate respose from the latent factors. The default is c(2, 1, 0, 0). |
standardization |
Logical. If TRUE, standardize X1 and X2 before output. The default is TRUE. |
Details
The latent factors in U are randomly generated normal vectors,
X_1 = U*V_1 + \sigma_x*E_1, X_2 = U*V_2 + \sigma_x*E_2, E_1, E_2
are N(0,1) noise matrices.
The nonzero entries of V_1
and V_2
are generated from Uniform([-1,-0.5]U[0.5,1]).
For Gaussian response,
y = U*\beta + \sigma_y*e_y, e_y
is N(0,1) noise vector,
for binary response,
y \sim rbinom(n, 1, 1/(1 + \exp(-U*\beta)))
,
and for Poisson response,
y \sim rpois(n, \exp(U*\beta))
.
See the reference for more details.
Value
X1 , X2 |
The two sets of covariates with dimensions n*p1 and n*p2 respectively. |
y |
The response vector with length n. |
U |
The true latent factor matrix with dimension n*rank. |
beta |
The coefficients used to generate response from |
V1 , V2 |
The true loading matrices for X1 and X2 with dimensions p1*rank and p2*rank. The first |
Author(s)
Chongliang Luo, Kun Chen.
References
Chongliang Luo, Jin Liu, Dipak D. Dey and Kun Chen (2016) Canonical variate regression. Biostatistics, doi: 10.1093/biostatistics/kxw001.
See Also
Examples
set.seed(42)
mydata <- SimulateCVR(family = "g", n = 100, rank = 4, p1 = 50, p2 = 70,
pnz = 10, beta = c(2, 1, 0, 0))
X1 <- mydata$X1
X2 <- mydata$X2
Xlist <- list(X1 = X1, X2 = X2);
Y <- mydata$y
opts <- list(standardization = FALSE, maxIters = 300, tol = 0.005)
## use sparse CCA solution as initial values, see SparseCCA()
Wini <- SparseCCA(X1, X2, 4, 0.7, 0.7)
## perform CVR with fixed eta and lambda, see cvrsolver()
fit <- cvrsolver(Y, Xlist, rank = 4, eta = 0.5, Lam = c(1, 1),
family = "gaussian", Wini, penalty = "GL1", opts)
## check sparsity recovery
fit$W[[1]];
fit$W[[2]];
## check orthogonality
X1W1 <- X1 %*% fit$W[[1]];
t(X1W1) %*% X1W1