SimulateCVR {CVR} R Documentation

## Generate simulation data.

### Description

Generate two sets of covariates and an univariate response driven by several latent factors.

### Usage

```SimulateCVR(family = c("gaussian", "binomial", "poisson"), n = 100,
rank = 4, p1 = 50, p2 = 70, pnz = 10, sigmax = 0.2,
sigmay = 0.5, beta = c(2, 1, 0, 0), standardization = TRUE)
```

### Arguments

 `family` Type of response. `"gaussian"` for continuous response, `"binomial"` for binary response, and `"poisson"` for Poisson response. The default is `"gaussian"`. `n` Number of rows. The default is 100. `rank` Number of latent factors generating the covariates. The default is 4. `p1` Number of variables in X1. The default is 50. `p2` Number of variables in X2. The default is 70. `pnz` Number of variables in X1 and X2 related to the signal. The default is 10. `sigmax` Standard deviation of normal noise in X1 and X2. The default is 0.2. `sigmay` Standard deviation of normal noise in Y. Only used when the response is Gaussian. The default is 0.5. `beta` Numeric vector, the coefficients used to generate respose from the latent factors. The default is c(2, 1, 0, 0). `standardization` Logical. If TRUE, standardize X1 and X2 before output. The default is TRUE.

### Details

The latent factors in U are randomly generated normal vectors,

X1 = U*V1 + sigmax*E1, X2 = U*V2 + sigmax*E2, E1, E2 are N(0,1) noise matrices.

The nonzero entries of V1 and V2 are generated from Uniform([-1,-0.5]U[0.5,1]).

For Gaussian response,

y = U*β + sigmay*ey, ey is N(0,1) noise vector,

for binary response,

y \sim rbinom(n, 1, 1/(1 + \exp(-U*β))),

and for Poisson response,

y \sim rpois(n, \exp(U*β)).

See the reference for more details.

### Value

 `X1, X2` The two sets of covariates with dimensions n*p1 and n*p2 respectively. `y` The response vector with length n. `U` The true latent factor matrix with dimension n*rank. `beta` The coefficients used to generate response from `U`. The length is rank. `V1, V2` The true loading matrices for X1 and X2 with dimensions p1*rank and p2*rank. The first `pnz` rows are nonzero.

### Author(s)

Chongliang Luo, Kun Chen.

### References

Chongliang Luo, Jin Liu, Dipak D. Dey and Kun Chen (2016) Canonical variate regression. Biostatistics, doi: 10.1093/biostatistics/kxw001.

`CVR`, `cvrsolver`.

### Examples

``` set.seed(42)
mydata <- SimulateCVR(family = "g", n = 100, rank = 4, p1 = 50, p2 = 70,
pnz = 10, beta = c(2, 1, 0, 0))
X1 <- mydata\$X1
X2 <- mydata\$X2
Xlist <- list(X1 = X1, X2 = X2);
Y <- mydata\$y
opts <- list(standardization = FALSE, maxIters = 300, tol = 0.005)
## use sparse CCA solution as initial values, see SparseCCA()
Wini <- SparseCCA(X1, X2, 4, 0.7, 0.7)
## perform CVR with fixed eta and lambda, see cvrsolver()
fit <- cvrsolver(Y, Xlist, rank = 4, eta = 0.5, Lam = c(1, 1),
family = "gaussian", Wini, penalty = "GL1", opts)
## check sparsity recovery
fit\$W[];
fit\$W[];
## check orthogonality
X1W1 <- X1 %*% fit\$W[];
t(X1W1) %*% X1W1
```

[Package CVR version 0.1.1 Index]