feat.mfs {mt} | R Documentation |
Multiple Feature Selection
Description
Multiple feature selection with or without resampling procedures.
Usage
feat.mfs(x,y,method,pars = valipars(),is.resam = TRUE, ...)
feat.mfs.stab(fs.res,rank.cutoff = 20,freq.cutoff = 0.5)
feat.mfs.stats(fs.stats,cumu.plot=FALSE, main="Stats Plot",
ylab="Values", xlab="Index of variable", ...)
Arguments
x |
A matrix or data frame containing the explanatory variables. |
y |
A factor specifying the class for each observation. |
method |
Multiple feature selection/ranking method to be used. |
pars |
A list of resampling scheme. See |
is.resam |
A logical value indicating whether the resampling should be applied. |
fs.res |
A list obtained by running |
rank.cutoff |
Cutoff of top features for frequency calculating. |
freq.cutoff |
Cutoff of feature frequency. |
fs.stats |
A matrix of feature statistics or values outputted by
|
cumu.plot |
A logical value indicating the cumulative scores should be plotted. |
main , xlab , ylab |
Plot parameters |
... |
Additional parameters. |
Details
feat.mfs.stab
summarises multiple feature selection only when
resampling strategy is employed (i.e. is.resam
is TRUE
when calling feat.mfs
). It obtains these results based on
feat.mfs
's returned value called all
.
feat.mfs.stats
handles the statistical values or scores. Its
purpose is to provide a guidance in selecting the best number of
features by spotting the elbow point. This method should work in
conjunction with plotting of p-values and their corresponding adjusted
values such as FDR and Bonferroni in the multiple hypothesis test.
Value
feat.mfs
returns a list with components:
fs.order |
A data frame of feature order from best to worst. |
fs.rank |
A matrix of feature ranking scores. |
fs.stats |
A matrix of feature statistics or values. |
all |
A list of output of |
feat.mfs.stab
returns a list with components:
fs.freq |
Feature frequencies larger than |
fs.subs |
Feature with frequencies larger than |
fs.stab |
Stability rate of feature ranking. |
fs.cons |
A matrix of feature consensus table based on feature frequency. |
feat.mfs.stats
returns a list with components:
stats.tab |
A statistical values with their corresponding names. |
stats.long |
Long-format of statistical values for plotting. |
stats.p |
An object of class "trellis". |
Note
The feature order can be computed directly from the overall statistics
fs.stats
. It is, however, slightly different from
fs.order
obtained by rank aggregation when resampling is
employed.
The fs.cons
and fs.freq
are computed based on
fs.order
.
Author(s)
Wanchang Lin
See Also
Examples
## Not run:
library(lattice)
data(abr1)
dat <- preproc(abr1$pos[,200:400], method="log10")
cls <- factor(abr1$fact$class)
tmp <- dat.sel(dat, cls, choices=c("1","2"))
x <- tmp[[1]]$dat
y <- tmp[[1]]$cls
fs.method <- c("fs.anova","fs.rf","fs.rfe")
fs.pars <- valipars(sampling="cv",niter=10,nreps=5)
fs <- feat.mfs(x, y, fs.method, fs.pars) ## with resampling
names(fs)
## frequency, consensus and stabilities of feature selection
fs.stab <- feat.mfs.stab(fs)
print(fs.stab$fs.cons,digits=2,na.print="")
## plot feature selection frequency
freq <- fs.stab$fs.freq
dotplot(freq$fs.anova, type="o", main="Feature Selection Frequencies")
barchart(freq$fs.anova)
## rank aggregation
fs.agg <- feat.agg(fs$fs.rank)
## stats table and plotting
fs.stats <- fs$fs.stats
tmp <- feat.mfs.stats(fs.stats, cumu.plot = TRUE)
tmp$stats.p
fs.tab <- tmp$stats.tab
## convert to matrix
fs.tab <- list2df(un.list(fs.tab))
## without resampling
fs.1 <- feat.mfs(x, y, method=fs.method, is.resam = FALSE)
## End(Not run)