R: Sparse Canonical Correlation analysis

sCCA {sRDA}

R Documentation

Sparse Canonical Correlation analysis

Description

Sparse Canonical Correlation analysis for high dimensional (biomedical) data. The function takes two datasets and returns a linear combination of maximally correlated canonical correlate pairs. Elastic net penalization (with its variants, UST, Ridge and Lasso penalization) is implemented for sparsity and smoothnesswith a built in cross validation procedure to obtain the optimal penalization parameters. It is possible to obtain multiple canonical variate pairs that are orthogonal to each other.

Usage

sCCA(predictor, predicted, penalization = "enet", ridge_penalty = 1,
  nonzero = 1, max_iterations = 100, tolerance = 1 * 10^-20,
  cross_validate = FALSE, parallel_CV = TRUE, nr_subsets = 10,
  multiple_LV = FALSE, nr_LVs = 1)

Arguments

`predictor`	The n*p matrix of the predictor data set
`predicted`	The n*q matrix of the predicted data set
`penalization`	The penalization method applied during the analysis (none, enet or ust)
`ridge_penalty`	The ridge penalty parameter of the predictor set's latent variable used for enet or ust (an integer if cross_validate = FALE, a list otherwise)
`nonzero`	The number of non-zero weights of the predictor set's latent variable (an integer if cross_validate = FALE, a list otherwise)
`max_iterations`	The maximum number of iterations of the algorithm
`tolerance`	Convergence criteria
`cross_validate`	K-fold cross validation to find best optimal penalty parameters (TRUE or FALSE)
`parallel_CV`	Run the cross validation parallel (TRUE or FALSE)
`nr_subsets`	Number of subsets for k-fold cross validation
`multiple_LV`	Obtain multiple latent variable pairs (TRUE or FALSE)
`nr_LVs`	Number of latent variables to be obtained

Value

An object of class "sRDA".

`XI`	Predictor set's latent variable scores
`ETA`	Predictive set's latent variable scores
`ALPHA`	Weights of the predictor set's latent variable
`BETA`	Weights of the predicted set's latent variable
`nr_iterations`	Number of iterations ran before convergence (or max number of iterations)
`SOLVE_XIXI`	Inverse of the predictor set's latent variable variance matrix
`iterations_crts`	The convergence criterion value (a small positive tolerance)
`sum_absolute_betas`	Sum of the absolute values of beta weights
`ridge_penalty`	The ridge penalty parameter used for the model
`nr_nonzeros`	The number of nonzero alpha weights in the model
`nr_latent_variables`	The number of latient variable pairs in the model
`CV_results`	The detailed results of cross validations (if cross_validate is TRUE)

Author(s)

Attila Csala

Examples


# generate data with few highly correlated variahbles
dataXY <- generate_data(nr_LVs = 2,
                           n = 250,
                           nr_correlated_Xs = c(5,20),
                           nr_uncorrelated_Xs = 250,
                           mean_reg_weights_assoc_X =
                             c(0.9,0.5),
                           sd_reg_weights_assoc_X =
                             c(0.05, 0.05),
                           Xnoise_min = -0.3,
                           Xnoise_max = 0.3,
                           nr_correlated_Ys = c(10,15),
                           nr_uncorrelated_Ys = 350,
                           mean_reg_weights_assoc_Y =
                             c(0.9,0.6),
                           sd_reg_weights_assoc_Y =
                             c(0.05, 0.05),
                           Ynoise_min = -0.3,
                           Ynoise_max = 0.3)

# seperate predictor and predicted sets
X <- dataXY$X
Y <- dataXY$Y

# run sRDA
CCA.res <- sCCA(predictor = X, predicted = Y, nonzero = 5,
ridge_penalty = 1, penalization = "ust")


# check first 10 weights of X
CCA.res$ALPHA[1:10]

## Not run: 
# run sRDA with cross-validation to determine best penalization parameters
CCA.res <- sCCA(predictor = X, predicted = Y, nonzero = c(5,10,15),
ridge_penalty = c(0.1,1), penalization = "enet", cross_validate = TRUE,
parallel_CV = TRUE)

# check first 10 weights of X
CCA.res$ALPHA[1:10]
CCA.res$ridge_penalty
CCA.res$nr_nonzeros

# obtain multiple latent variables
CCA.res <- sCCA(predictor = X, predicted = Y, nonzero = c(5,10,15),
ridge_penalty = c(0.1,1), penalization = "enet", cross_validate = TRUE,
parallel_CV = TRUE, multiple_LV = TRUE, nr_LVs = 2, max_iterations = 5)

# check first 10 weights of X in first two component
CCA.res$ALPHA[[1]][1:10]
CCA.res$ALPHA[[2]][1:10]

# latent variables are orthogonal to each other
t(CCA.res$XI[[1]]) %*% CCA.res$XI[[2]]


## End(Not run)

[Package sRDA version 1.0.0 Index]