R: Sparse CCA for data of mixed types with BIC criterion

mixedCCA {mixedCCA}

R Documentation

Sparse CCA for data of mixed types with BIC criterion

Description

Applies sparse canonical correlation analysis (CCA) for high-dimensional data of mixed types (continuous/binary/truncated continuous). Derived rank-based estimator instead of sample correlation matrix is implemented. There are two types of BIC criteria for variable selection. We found that BIC1 works best for variable selection, whereas BIC2 works best for prediction.

Usage

mixedCCA(
  X1,
  X2,
  type1,
  type2,
  lamseq1 = NULL,
  lamseq2 = NULL,
  nlamseq = 20,
  lam.eps = 0.01,
  w1init = NULL,
  w2init = NULL,
  BICtype,
  KendallR = NULL,
  maxiter = 100,
  tol = 0.01,
  trace = FALSE,
  lassoverbose = FALSE
)

Arguments

`X1`	A numeric data matrix (n by p1).
`X2`	A numeric data matrix (n by p2).
`type1`	A type of data `X1` among "continuous", "binary", "trunc".
`type2`	A type of data `X2` among "continuous", "binary", "trunc".
`lamseq1`	A tuning parameter sequence for `X1`. The length should be the same as `lamseq2`.
`lamseq2`	A tuning parameter sequence for `X2`. The length should be the same as `lamseq1`.
`nlamseq`	The number of tuning parameter sequence lambda - default is 20.
`lam.eps`	A ratio of the smallest value for lambda to the maximum value of lambda.
`w1init`	An initial vector of length p1 for canonical direction `w1`.
`w2init`	An initial vector of length p2 for canonical direction `w2`.
`BICtype`	Either 1 or 2: For more details for two options, see the reference.
`KendallR`	An estimated Kendall `\tau` matrix. The default is NULL, which means that it will be automatically estimated by Kendall's `\tau` estimator unless the user supplies.
`maxiter`	The maximum number of iterations allowed.
`tol`	The desired accuracy (convergence tolerance).
`trace`	If `trace = TRUE`, progress per each iteration will be printed. The default value is `FALSE`.
`lassoverbose`	If `lassoverbose = TRUE`, all warnings from lassobic optimization regarding convergence will be printed. The default value is `lassoverbose = FALSE`.

Value

mixedCCA returns a data.frame containing

KendallR: estimated Kendall's \tau matrix estimator.
lambda_seq: the values of lamseq used for sparse CCA.
w1: estimated canonical direction w1.
w2: estimated canonical direction w2.
cancor: estimated canonical correlation.
fitresult: more details regarding the progress at each iteration.

References

Yoon G., Carroll R.J. and Gaynanova I. (2020) "Sparse semiparametric canonical correlation analysis for data of mixed types" <doi:10.1093/biomet/asaa007>.

Examples

### Simple example

# Data setting
n <- 100; p1 <- 15; p2 <- 10 # sample size and dimensions for two datasets.
maxcancor <- 0.9 # true canonical correlation

# Correlation structure within each data set
set.seed(0)
perm1 <- sample(1:p1, size = p1);
Sigma1 <- autocor(p1, 0.7)[perm1, perm1]
blockind <- sample(1:3, size = p2, replace = TRUE);
Sigma2 <- blockcor(blockind, 0.7)
mu <- rbinom(p1+p2, 1, 0.5)

# true variable indices for each dataset
trueidx1 <- c(rep(1, 3), rep(0, p1-3))
trueidx2 <- c(rep(1, 2), rep(0, p2-2))

# Data generation
simdata <- GenerateData(n=n, trueidx1 = trueidx1, trueidx2 = trueidx2, maxcancor = maxcancor,
                        Sigma1 = Sigma1, Sigma2 = Sigma2,
                        copula1 = "exp", copula2 = "cube",
                        muZ = mu,
                        type1 = "trunc", type2 = "trunc",
                        c1 = rep(1, p1), c2 =  rep(0, p2)
)
X1 <- simdata$X1
X2 <- simdata$X2

# Check the range of truncation levels of variables
range(colMeans(X1 == 0))
range(colMeans(X2 == 0))

# Kendall CCA with BIC1
kendallcca1 <- mixedCCA(X1, X2, type1 = "trunc", type2 = "trunc", BICtype = 1, nlamseq = 10)

# Kendall CCA with BIC2. Estimated correlation matrix is plugged in from the above result.
R <- kendallcca1$KendallR
kendallcca2 <- mixedCCA(X1, X2, type1 = "trunc", type2 = "trunc",
                        KendallR = R, BICtype = 2, nlamseq = 10)

[Package mixedCCA version 1.6.2 Index]