R: Multiple Feature Selection

feat.mfs {mt}

R Documentation

Multiple Feature Selection

Description

Multiple feature selection with or without resampling procedures.

Usage

feat.mfs(x,y,method,pars = valipars(),is.resam = TRUE, ...)
         
feat.mfs.stab(fs.res,rank.cutoff = 20,freq.cutoff = 0.5)

feat.mfs.stats(fs.stats,cumu.plot=FALSE, main="Stats Plot", 
               ylab="Values", xlab="Index of variable", ...)

Arguments

`x`	A matrix or data frame containing the explanatory variables.
`y`	A factor specifying the class for each observation.
`method`	Multiple feature selection/ranking method to be used.
`pars`	A list of resampling scheme. See `valipars` for details.
`is.resam`	A logical value indicating whether the resampling should be applied.
`fs.res`	A list obtained by running `feat.mfs` .
`rank.cutoff`	Cutoff of top features for frequency calculating.
`freq.cutoff`	Cutoff of feature frequency.
`fs.stats`	A matrix of feature statistics or values outputted by `feat.mfs`
`cumu.plot`	A logical value indicating the cumulative scores should be plotted.
`main`, `xlab`, `ylab`	Plot parameters
`...`	Additional parameters.

Details

feat.mfs.stab summarises multiple feature selection only when resampling strategy is employed (i.e. is.resam is TRUE when calling feat.mfs). It obtains these results based on feat.mfs's returned value called all.

feat.mfs.stats handles the statistical values or scores. Its purpose is to provide a guidance in selecting the best number of features by spotting the elbow point. This method should work in conjunction with plotting of p-values and their corresponding adjusted values such as FDR and Bonferroni in the multiple hypothesis test.

Value

feat.mfs returns a list with components:

`fs.order`	A data frame of feature order from best to worst.
`fs.rank`	A matrix of feature ranking scores.
`fs.stats`	A matrix of feature statistics or values.
`all`	A list of output of `feat.rank.re` for each feature selection method.

feat.mfs.stab returns a list with components:

`fs.freq`	Feature frequencies larger than `freq.cutoff`.
`fs.subs`	Feature with frequencies larger than `freq.cutoff`.
`fs.stab`	Stability rate of feature ranking.
`fs.cons`	A matrix of feature consensus table based on feature frequency.

feat.mfs.stats returns a list with components:

`stats.tab`	A statistical values with their corresponding names.
`stats.long`	Long-format of statistical values for plotting.
`stats.p`	An object of class "trellis".

Note

The feature order can be computed directly from the overall statistics fs.stats. It is, however, slightly different from fs.order obtained by rank aggregation when resampling is employed.

The fs.cons and fs.freq are computed based on fs.order.

Author(s)

Wanchang Lin

Examples

## Not run: 
library(lattice)	
data(abr1)
dat <- preproc(abr1$pos[,200:400], method="log10")  
cls <- factor(abr1$fact$class)

tmp <- dat.sel(dat, cls, choices=c("1","2"))
x   <- tmp[[1]]$dat
y   <- tmp[[1]]$cls

fs.method <- c("fs.anova","fs.rf","fs.rfe")
fs.pars   <- valipars(sampling="cv",niter=10,nreps=5)
fs <- feat.mfs(x, y, fs.method, fs.pars)   ## with resampling
names(fs)

## frequency, consensus and stabilities of feature selection 
fs.stab <- feat.mfs.stab(fs)
print(fs.stab$fs.cons,digits=2,na.print="")

## plot feature selection frequency
freq <- fs.stab$fs.freq
dotplot(freq$fs.anova, type="o", main="Feature Selection Frequencies")
barchart(freq$fs.anova)

## rank aggregation 
fs.agg <- feat.agg(fs$fs.rank)

## stats table and plotting
fs.stats <- fs$fs.stats
tmp <- feat.mfs.stats(fs.stats, cumu.plot = TRUE)
tmp$stats.p
fs.tab <- tmp$stats.tab
## convert to matrix
fs.tab <- list2df(un.list(fs.tab))

## without resampling
fs.1 <- feat.mfs(x, y, method=fs.method, is.resam = FALSE)

## End(Not run)

[Package mt version 2.0-1.20 Index]