FeaLect-package {FeaLect} | R Documentation |
Scores Features for Feature Selection
Description
Suppose you have a feature matrix with 200 features and only 20 samples and your goal is to build a classifier. You can run the FeaLect() function to compute the scores for your features. Only the relatively high score features (say the top 20) are recommended for further analysis. In this way, one can prevent overfitting by reducing the number of features significantly.
Details
The DESCRIPTION file:
Package: | FeaLect |
Type: | Package |
Title: | Scores Features for Feature Selection |
Version: | 1.20 |
Date: | 2020-02-25 |
Author: | Habil Zare |
Maintainer: | Habil Zare <zare@u.washington.edu> |
Depends: | lars, rms |
Description: | For each feature, a score is computed that can be useful for feature selection. Several random subsets are sampled from the input data and for each random subset, various linear models are fitted using lars method. A score is assigned to each feature based on the tendency of LASSO in including that feature in the models.Finally, the average score and the models are returned as the output. The features with relatively low scores are recommended to be ignored because they can lead to overfitting of the model to the training data. Moreover, for each random subset, the best set of features in terms of global error is returned. They are useful for applying Bolasso, the alternative feature selection method that recommends the intersection of features subsets. |
License: | GPL (>= 2) |
LazyLoad: | yes |
Repository: | CRAN |
Date/Publication: | 2018-06-01 13:13:46 UTC |
Packaged: | 2018-06-01 00:07:37 UTC; habil |
NeedsCompilation: | no |
RoxygenNote: | 6.0.1 |
Index of help topics:
FeaLect Computes the scores of the features. FeaLect-package Scores Features for Feature Selection compute.balanced Balances between negative and positive samples by oversampling. compute.logistic.score Fits a logistic regression model using the linear scores doctor.validate Validates a model using validating samples. ignore.redundant Refines a feature matrix input.check.FeaLect Checks the inputs to Fealect() function. mcl_sll MCL and SLL lymphoma subtypes random.subset Selects a random subset of the input. train.doctor Fits various models based on a combination on penalized linear models and logistic regression.
Author(s)
Habil Zare
Maintainer: Habil Zare <zare@u.washington.edu>
References
Zare, Habil, et al. "Scoring relevancy of features based on combinatorial analysis of Lasso with application to lymphoma diagnosis." BMC genomics. Vol. 14. No. 1. BioMed Central, 2013.
See Also
FeaLect
, train.doctor
, doctor.validate
,
random.subset
, compute.balanced
,compute.logistic.score
,
ignore.redundant
, input.check.FeaLect
,
lars-package
, and SparseLearner-package
Examples
library(FeaLect)
data(mcl_sll)
F <- as.matrix(mcl_sll[ ,-1]) # The Feature matrix
L <- as.numeric(mcl_sll[ ,1]) # The labels
names(L) <- rownames(F)
message(dim(F)[1], " samples and ",dim(F)[2], " features.")
## For this data, total.num.of.models is suggested to be at least 100.
FeaLect.result.1 <-FeaLect(F=F,L=L,maximum.features.num=10,total.num.of.models=20,talk=TRUE)