mixedCCA {mixedCCA} | R Documentation |
Sparse CCA for data of mixed types with BIC criterion
Description
Applies sparse canonical correlation analysis (CCA) for high-dimensional data of mixed types (continuous/binary/truncated continuous). Derived rank-based estimator instead of sample correlation matrix is implemented. There are two types of BIC criteria for variable selection. We found that BIC1 works best for variable selection, whereas BIC2 works best for prediction.
Usage
mixedCCA(
X1,
X2,
type1,
type2,
lamseq1 = NULL,
lamseq2 = NULL,
nlamseq = 20,
lam.eps = 0.01,
w1init = NULL,
w2init = NULL,
BICtype,
KendallR = NULL,
maxiter = 100,
tol = 0.01,
trace = FALSE,
lassoverbose = FALSE
)
Arguments
X1 |
A numeric data matrix (n by p1). |
X2 |
A numeric data matrix (n by p2). |
type1 |
A type of data |
type2 |
A type of data |
lamseq1 |
A tuning parameter sequence for |
lamseq2 |
A tuning parameter sequence for |
nlamseq |
The number of tuning parameter sequence lambda - default is 20. |
lam.eps |
A ratio of the smallest value for lambda to the maximum value of lambda. |
w1init |
An initial vector of length p1 for canonical direction |
w2init |
An initial vector of length p2 for canonical direction |
BICtype |
Either 1 or 2: For more details for two options, see the reference. |
KendallR |
An estimated Kendall |
maxiter |
The maximum number of iterations allowed. |
tol |
The desired accuracy (convergence tolerance). |
trace |
If |
lassoverbose |
If |
Value
mixedCCA
returns a data.frame containing
KendallR: estimated Kendall's
\tau
matrix estimator.lambda_seq: the values of
lamseq
used for sparse CCA.w1: estimated canonical direction
w1
.w2: estimated canonical direction
w2
.cancor: estimated canonical correlation.
fitresult: more details regarding the progress at each iteration.
References
Yoon G., Carroll R.J. and Gaynanova I. (2020) "Sparse semiparametric canonical correlation analysis for data of mixed types" <doi:10.1093/biomet/asaa007>.
Examples
### Simple example
# Data setting
n <- 100; p1 <- 15; p2 <- 10 # sample size and dimensions for two datasets.
maxcancor <- 0.9 # true canonical correlation
# Correlation structure within each data set
set.seed(0)
perm1 <- sample(1:p1, size = p1);
Sigma1 <- autocor(p1, 0.7)[perm1, perm1]
blockind <- sample(1:3, size = p2, replace = TRUE);
Sigma2 <- blockcor(blockind, 0.7)
mu <- rbinom(p1+p2, 1, 0.5)
# true variable indices for each dataset
trueidx1 <- c(rep(1, 3), rep(0, p1-3))
trueidx2 <- c(rep(1, 2), rep(0, p2-2))
# Data generation
simdata <- GenerateData(n=n, trueidx1 = trueidx1, trueidx2 = trueidx2, maxcancor = maxcancor,
Sigma1 = Sigma1, Sigma2 = Sigma2,
copula1 = "exp", copula2 = "cube",
muZ = mu,
type1 = "trunc", type2 = "trunc",
c1 = rep(1, p1), c2 = rep(0, p2)
)
X1 <- simdata$X1
X2 <- simdata$X2
# Check the range of truncation levels of variables
range(colMeans(X1 == 0))
range(colMeans(X2 == 0))
# Kendall CCA with BIC1
kendallcca1 <- mixedCCA(X1, X2, type1 = "trunc", type2 = "trunc", BICtype = 1, nlamseq = 10)
# Kendall CCA with BIC2. Estimated correlation matrix is plugged in from the above result.
R <- kendallcca1$KendallR
kendallcca2 <- mixedCCA(X1, X2, type1 = "trunc", type2 = "trunc",
KendallR = R, BICtype = 2, nlamseq = 10)