fs.rf {mt}R Documentation

Feature Selection Using Random Forests (RF)

Description

Feature selection using Random Forests (RF).

Usage

  fs.rf(x,y,...)
  fs.rf.1(x,y,fs.len="power2",...)

Arguments

x

A data frame or matrix of data set.

y

A factor or vector of class.

fs.len

Method or numeric sequence for feature lengths. For details, see get.fs.len

...

Arguments to pass to randomForests.

Details

fs.rf.1 select features based on successively eliminating the least important variables.

Value

A list with components:

fs.rank

A vector of feature ranking scores.

fs.order

A vector of feature order from best to worst.

stats

A vector of measurements. For fs.rf, it is Random Forest important score. For fs.rf.1, it is a dummy variable (current ignored).

Author(s)

Wanchang Lin

Examples

data(abr1)
cls <- factor(abr1$fact$class)
dat <- abr1$pos

## fill zeros with NAs
dat <- mv.zene(dat)

## missing values summary
mv <- mv.stats(dat, grp=cls) 
mv    ## View the missing value pattern

## filter missing value variables
dat <- dat[,mv$mv.var < 0.15]

## fill NAs with mean
dat <- mv.fill(dat,method="mean")

## log transformation
dat <- preproc(dat, method="log10")

## select class "1" and "2" for feature ranking
ind <- grepl("1|2", cls)
mat <- dat[ind,,drop=FALSE] 
mat <- as.matrix(mat)
grp <- cls[ind, drop=TRUE]   

## apply random forests for feature selection/ranking
res   <- fs.rf(mat,grp)
res.1 <- fs.rf.1(mat,grp)

## compare the results
fs <- cbind(fs.rf=res$fs.order, fs.rf.1=res.1$fs.order)

## plot the important score of 'fs.rf' (not 'fs.rf.1')
score <- res$stats
score <- sort(score, decreasing = TRUE)
plot(score)


[Package mt version 2.0-1.20 Index]