GenerateData {mixedCCA} | R Documentation |
Mixed type simulation data generator for sparse CCA
Description
GenerateData
is used to generate two sets of data of mixed types for sparse CCA under the Gaussian copula model.
Usage
GenerateData(
n,
trueidx1,
trueidx2,
Sigma1,
Sigma2,
maxcancor,
copula1 = "no",
copula2 = "no",
type1 = "continuous",
type2 = "continuous",
muZ = NULL,
c1 = NULL,
c2 = NULL
)
Arguments
n |
Sample size |
trueidx1 |
True canonical direction of length p1 for |
trueidx2 |
True canonical direction of length p2 for |
Sigma1 |
True correlation matrix of latent variable |
Sigma2 |
True correlation matrix of latent variable |
maxcancor |
True canonical correlation between |
copula1 |
Copula type for the first dataset. U1 = f(Z1), which could be either "exp", "cube". |
copula2 |
Copula type for the second dataset. U2 = f(Z2), which could be either "exp", "cube". |
type1 |
Type of the first dataset |
type2 |
Type of the second dataset |
muZ |
Mean of latent multivariate normal. |
c1 |
Constant threshold for |
c2 |
Constant threshold for |
Value
GenerateData
returns a list containing
Z1: latent numeric data matrix (n by p1).
Z2: latent numeric data matrix (n by p2).
X1: observed numeric data matrix (n by p1).
X2: observed numeric data matrix (n by p2).
true_w1: normalized true canonical direction of length p1 for
X1
.true_w2: normalized true canonical direction of length p2 for
X2
.type: a vector containing types of two datasets.
maxcancor: true canonical correlation between
Z1
andZ2
.c1: constant threshold for
X1
for "trunc" and "binary" data type.c2: constant threshold for
X2
for "trunc" and "binary" data type.Sigma: true latent correlation matrix of
Z1
andZ2
((p1+p2) by (p1+p2)).
Examples
### Simple example
# Data setting
n <- 100; p1 <- 15; p2 <- 10 # sample size and dimensions for two datasets.
maxcancor <- 0.9 # true canonical correlation
# Correlation structure within each data set
set.seed(0)
perm1 <- sample(1:p1, size = p1);
Sigma1 <- autocor(p1, 0.7)[perm1, perm1]
blockind <- sample(1:3, size = p2, replace = TRUE);
Sigma2 <- blockcor(blockind, 0.7)
mu <- rbinom(p1+p2, 1, 0.5)
# true variable indices for each dataset
trueidx1 <- c(rep(1, 3), rep(0, p1-3))
trueidx2 <- c(rep(1, 2), rep(0, p2-2))
# Data generation
simdata <- GenerateData(n=n, trueidx1 = trueidx1, trueidx2 = trueidx2, maxcancor = maxcancor,
Sigma1 = Sigma1, Sigma2 = Sigma2,
copula1 = "exp", copula2 = "cube",
muZ = mu,
type1 = "trunc", type2 = "trunc",
c1 = rep(1, p1), c2 = rep(0, p2)
)
X1 <- simdata$X1
X2 <- simdata$X2
# Check the range of truncation levels of variables
range(colMeans(X1 == 0))
range(colMeans(X2 == 0))