cosci_is_select {fusionclust} | R Documentation |
Use a data driven approach to select the features
Description
Once you have the feature scores from cosci_is
, you can select the features
based on a pre-defined threshold,
using table A.10 in the paper[1] to determine an appropriate threshold or,
using a data driven approach described in the references to select the features and obtain an implicit threshold value.
cosci_is_select implements option 3.
Usage
cosci_is_select(score, gamma)
Arguments
score |
a p vector of scores |
gamma |
what proportion of the p features is noise? If your sample size n is smaller than 100, setting gamma = 0.85 is recommended. Otherwise set gamma = 0.9 |
Details
Converts the problem of screening out features with lower scores into a problem in large scale multiple testing and uses the procedure described in reference [2] to determine the signal features.
Value
a vector of selected features
References
Banerjee, T., Mukherjee, G. and Radchenko P., Feature Screening in Large Scale Cluster Analysis, Journal of Multivariate Analysis, Volume 161, 2017, Pages 191-212
T. Cai, W. Sun, W., Optimal screening and discovery of sparse signals with applications to multistage high throughput studies, J. Roy.Statist. Soc. Ser. B (Statistical Methodology) 79, no. 1 (2017) 197-223
See Also
Examples
library(fusionclust)
set.seed(42)
noise<-matrix(rnorm(49000),nrow=1000,ncol=49)
set.seed(42)
signal<-c(rnorm(500,-1.5,1),rnorm(500,1.5,1))
x<-cbind(signal,noise)
scores<- cosci_is(x,0)
features<-cosci_is_select(scores,0.9)