R: Scores Features for Feature Selection

FeaLect-package {FeaLect}

R Documentation

Scores Features for Feature Selection

Description

Suppose you have a feature matrix with 200 features and only 20 samples and your goal is to build a classifier. You can run the FeaLect() function to compute the scores for your features. Only the relatively high score features (say the top 20) are recommended for further analysis. In this way, one can prevent overfitting by reducing the number of features significantly.

Details

The DESCRIPTION file:

Package:	FeaLect
Type:	Package
Title:	Scores Features for Feature Selection
Version:	1.20
Date:	2020-02-25
Author:	Habil Zare
Maintainer:	Habil Zare <zare@u.washington.edu>
Depends:	lars, rms
Description:	For each feature, a score is computed that can be useful for feature selection. Several random subsets are sampled from the input data and for each random subset, various linear models are fitted using lars method. A score is assigned to each feature based on the tendency of LASSO in including that feature in the models.Finally, the average score and the models are returned as the output. The features with relatively low scores are recommended to be ignored because they can lead to overfitting of the model to the training data. Moreover, for each random subset, the best set of features in terms of global error is returned. They are useful for applying Bolasso, the alternative feature selection method that recommends the intersection of features subsets.
License:	GPL (>= 2)
LazyLoad:	yes
Repository:	CRAN
Date/Publication:	2018-06-01 13:13:46 UTC
Packaged:	2018-06-01 00:07:37 UTC; habil
NeedsCompilation:	no
RoxygenNote:	6.0.1

Index of help topics:

FeaLect                 Computes the scores of the features.
FeaLect-package         Scores Features for Feature Selection
compute.balanced        Balances between negative and positive samples
                        by oversampling.
compute.logistic.score
                        Fits a logistic regression model using the
                        linear scores
doctor.validate         Validates a model using validating samples.
ignore.redundant        Refines a feature matrix
input.check.FeaLect     Checks the inputs to Fealect() function.
mcl_sll                 MCL and SLL lymphoma subtypes
random.subset           Selects a random subset of the input.
train.doctor            Fits various models based on a combination on
                        penalized linear models and logistic
                        regression.

Author(s)

Habil Zare

Maintainer: Habil Zare <zare@u.washington.edu>

References

Zare, Habil, et al. "Scoring relevancy of features based on combinatorial analysis of Lasso with application to lymphoma diagnosis." BMC genomics. Vol. 14. No. 1. BioMed Central, 2013.

Examples

library(FeaLect)
data(mcl_sll)
F <- as.matrix(mcl_sll[ ,-1])	# The Feature matrix
L <- as.numeric(mcl_sll[ ,1])	# The labels
names(L) <- rownames(F)
message(dim(F)[1], " samples and ",dim(F)[2], " features.")

## For this data, total.num.of.models is suggested to be at least 100.
FeaLect.result.1 <-FeaLect(F=F,L=L,maximum.features.num=10,total.num.of.models=20,talk=TRUE)