R: sRDA.

sRDA {sRDA}

R Documentation

sRDA.

Description

sRDA.

Sparse Redundancy Analysis (sRDA) to express the maximum variance in the predicted data set by a linear combination of variables (latent variable) of the predictive data set. Elastic net penalization (with its variants, UST, Ridge and Lasso penalization) is implemented for sparsity and smoothness with a built in cross validation procedure to obtain the optimal penalization parameters. It is possible to obtain multiple latent variables which are orthogonal to each other, thus each explaining a different protion of variance in the predicted data set. sRDA is implemented in a Partial Least Squares framework, for more details see Csala et al. (2017).

Usage

sRDA(predictor, predicted, penalization = "enet", ridge_penalty = 1,
  nonzero = 1, max_iterations = 100, tolerance = 1 * 10^-20,
  cross_validate = FALSE, parallel_CV = FALSE, nr_subsets = 10,
  multiple_LV = FALSE, nr_LVs = 1)

Arguments

`predictor`	The n*p matrix of the predictor data set
`predicted`	The n*q matrix of the predicted data set
`penalization`	The penalization method applied during the analysis (none, enet or ust)
`ridge_penalty`	The ridge penalty parameter of the predictor set's latent variable used for enet (an integer if cross_validate = FALSE, a list otherwise)
`nonzero`	The number of non-zero weights of the predictor set's latent variable used for enet or ust (an integer if cross_validate = FALSE, a list otherwise)
`max_iterations`	The maximum number of iterations of the algorithm (integer)
`tolerance`	Convergence criteria (number, a small positive tolerance)
`cross_validate`	K-fold cross validation to find best optimal penalty parameters (TRUE or FALSE)
`parallel_CV`	Run the cross validation parallel (TRUE or FALSE)
`nr_subsets`	Number of subsets for k-fold cross validation (integer, the value for k)
`multiple_LV`	Obtain multiple latent variable pairs (TRUE or FALSE)
`nr_LVs`	Number of latent variable pairs (components) to be obtained (integer)

Value

An object of class "sRDA".

`XI`	Predictor set's latent variable scores
`ETA`	Predictive set's latent variable scores
`ALPHA`	Weights of the predictor set's latent variable
`BETA`	Weights of the predicted set's latent variable
`nr_iterations`	Number of iterations ran before convergence (or max number of iterations)
`SOLVE_XIXI`	Inverse of the predictor set's latent variable variance matrix
`iterations_crts`	The convergence criterion value (a small positive tolerance)
`sum_absolute_betas`	Sum of the absolute values of beta weights
`ridge_penalty`	The ridge penalty parameter used for the model
`nr_nonzeros`	The number of nonzero alpha weights in the model
`nr_latent_variables`	The number of latient variable pairs (components) in the model
`CV_results`	The detailed results of cross validations (if cross_validate is TRUE)

Author(s)

Attila Csala

References

Csala A., Voorbraak F.P.J.M., Zwinderman A.H., and Hof M.H. (2017) Sparse redundancy analysis of high-dimensional genetic and genomic data. Bioinformatics, 33, pp.3228-3234. https://doi.org/10.1093/bioinformatics/btx374

Examples

# generate data with few highly correlated variahbles
dataXY <- generate_data(nr_LVs = 2,
                           n = 250,
                           nr_correlated_Xs = c(5,20),
                           nr_uncorrelated_Xs = 250,
                           mean_reg_weights_assoc_X =
                             c(0.9,0.5),
                           sd_reg_weights_assoc_X =
                             c(0.05, 0.05),
                           Xnoise_min = -0.3,
                           Xnoise_max = 0.3,
                           nr_correlated_Ys = c(10,15),
                           nr_uncorrelated_Ys = 350,
                           mean_reg_weights_assoc_Y =
                             c(0.9,0.6),
                           sd_reg_weights_assoc_Y =
                             c(0.05, 0.05),
                           Ynoise_min = -0.3,
                           Ynoise_max = 0.3)



# seperate predictor and predicted sets
X <- dataXY$X
Y <- dataXY$Y

# run sRDA
RDA.res <- sRDA(predictor = X, predicted = Y, nonzero = 5,
ridge_penalty = 1, penalization = "ust")


# check first 10 weights of X
RDA.res$ALPHA[1:10]

## Not run: 
# run sRDA with cross-validation to determine best penalization parameters
RDA.res <- sRDA(predictor = X, predicted = Y, nonzero = c(5,10,15),
ridge_penalty = c(0.1,1), penalization = "enet", cross_validate = TRUE,
parallel_CV = TRUE)

# check first 10 weights of X
RDA.res$ALPHA[1:10]

# check the Ridge parameter and the number of nonzeros included in the model
RDA.res$ridge_penalty
RDA.res$nr_nonzeros

# check how much time the cross validation did take
RDA.res$CV_results$stime

# obtain multiple latent variables (components)
RDA.res <- sRDA(predictor = X, predicted = Y, nonzero = c(5,10,15),
ridge_penalty = c(0.1,1), penalization = "enet", cross_validate = TRUE,
parallel_CV = TRUE, multiple_LV = TRUE, nr_LVs = 2, max_iterations = 5)

# check first 20 weights of X in first two component
RDA.res$ALPHA[[1]][1:20]
RDA.res$ALPHA[[2]][1:20]

# components are orthogonal to each other
t(RDA.res$XI[[1]]) %*% RDA.res$XI[[2]]


## End(Not run)

[Package sRDA version 1.0.0 Index]