rsc {RSC}R Documentation

Robust and Sparse Correlation Matrix Estimator

Description

Compute the Robust and Sparse Correlation Matrix (RSC) estimator proposed in Serra et al. (2018).

Usage

  rsc(cv, threshold = "minimum") 

Arguments

cv

An S3 object of class "rsc_cv" (see rsc_cv).

threshold

Threshold parameter to compute the RSC estimate. This is a numeric value taken onto the interval (0,1), or it is equal to "minimum" or "minimum1se" for selecting the optimal threshold according to the selection performed in rsc_cv.

Details

The setting threshold = "minimum" or threshold = "minimum1se" applies thresholding according to the criteria discussed in the Details section in rsc_cv. When cv is obtained using rsc_cv with cv.type = "random", the default settings for rsc implements exactly the RSC estimator proposed in Serra et al., (2018).

Although threshold = "minimum" is the default choice, in high-dimensional situations threshold = "minimum1se" usually provides a more parsimonious representation of the correlation structure. Since the underlying RMAD matrix is passed through the cv input, any other hand-tuned threshold to the RMAD matrix can be applied without significant additional computational costs. The latter can be done setting threshold to any value onto the (0,1) interval.

The software is optimized to handle high-dimensional data sets, therefore, the output RSC matrix is packed into a storage efficient sparse format using the "dsCMatrix" S4 class from the Matrix package. The latter is specifically designed for sparse real symmetric matrices.

Value

Returns a sparse correlaiton matrix of class "dsCMatrix" (S4 class object) as defined in the Matrix package.

References

Serra, A., Coretto, P., Fratello, M., and Tagliaferri, R. (2018). Robust and sparsecorrelation matrix estimation for the analysis of high-dimensional genomics data. Bioinformatics, 34(4), 625-634. doi:10.1093/bioinformatics/btx642

See Also

rsc_cv

Examples

## simulate a random sample from a multivariate Cauchy distribution
## note: example in high-dimension are obtained increasing p
set.seed(1)
n   <- 100  # sample size
p   <- 10   # dimension
dat <- matrix(rt(n*p, df = 1), nrow = n, ncol = p)
colnames(dat) <- paste0("Var", 1:p)

   
## perform 10-fold cross-validation repeated R=10 times
## note: for multi-core machines experiment with 'ncores'
set.seed(2)
a <- rsc_cv(x = dat, R = 10, K = 10, ncores = 1)
a

## obtain the RSC matrix with "minimum" flagged solution 
b <- rsc(cv = a, threshold = "minimum")
b
   
## obtain the RSC matrix with "minimum1se" flagged solution 
d <- rsc(cv = a, threshold = "minimum1se")
d

## since the object 'a' stores the RMAD underlying estimator, we can
## apply thresholding at any level without re-estimating the RMAD 
## matrix
e <- rsc(cv = a, threshold = 0.5)
e

[Package RSC version 2.0.4 Index]