rsc {RSC} | R Documentation |
Robust and Sparse Correlation Matrix Estimator
Description
Compute the Robust and Sparse Correlation Matrix (RSC) estimator proposed in Serra et al. (2018).
Usage
rsc(cv, threshold = "minimum")
Arguments
cv |
An S3 object of class |
threshold |
Threshold parameter to compute the RSC estimate. This
is a numeric value taken onto the interval (0,1), or it is
equal to |
Details
The setting threshold = "minimum"
or threshold =
"minimum1se"
applies thresholding according to the criteria
discussed in the Details section in rsc_cv
.
When cv
is obtained using rsc_cv
with
cv.type = "random"
, the default settings for rsc
implements exactly the RSC estimator proposed in Serra et al.,
(2018).
Although threshold = "minimum"
is the default choice, in
high-dimensional situations threshold = "minimum1se"
usually
provides a more parsimonious representation of the correlation
structure. Since the underlying RMAD matrix is passed through the
cv
input, any other hand-tuned threshold to the RMAD matrix
can be applied without significant additional computational
costs. The latter can be done setting threshold
to any value
onto the (0,1) interval.
The software is optimized to handle high-dimensional data sets,
therefore, the output RSC matrix is packed into a storage efficient
sparse format using the "dsCMatrix"
S4 class from the
Matrix
package. The latter is specifically designed for
sparse real symmetric matrices.
Value
Returns a sparse correlaiton matrix of class "dsCMatrix"
(S4 class object) as defined in the Matrix
package.
References
Serra, A., Coretto, P., Fratello, M., and Tagliaferri, R. (2018). Robust and sparsecorrelation matrix estimation for the analysis of high-dimensional genomics data. Bioinformatics, 34(4), 625-634. doi:10.1093/bioinformatics/btx642
See Also
Examples
## simulate a random sample from a multivariate Cauchy distribution
## note: example in high-dimension are obtained increasing p
set.seed(1)
n <- 100 # sample size
p <- 10 # dimension
dat <- matrix(rt(n*p, df = 1), nrow = n, ncol = p)
colnames(dat) <- paste0("Var", 1:p)
## perform 10-fold cross-validation repeated R=10 times
## note: for multi-core machines experiment with 'ncores'
set.seed(2)
a <- rsc_cv(x = dat, R = 10, K = 10, ncores = 1)
a
## obtain the RSC matrix with "minimum" flagged solution
b <- rsc(cv = a, threshold = "minimum")
b
## obtain the RSC matrix with "minimum1se" flagged solution
d <- rsc(cv = a, threshold = "minimum1se")
d
## since the object 'a' stores the RMAD underlying estimator, we can
## apply thresholding at any level without re-estimating the RMAD
## matrix
e <- rsc(cv = a, threshold = 0.5)
e