R: Naive feature selection method utilising the rFerns shadow...

naiveWrapper {rFerns}

R Documentation

Naive feature selection method utilising the rFerns shadow imporance

Description

Proof-of-concept ensemble of rFerns models, built to stabilise and improve selection based on shadow importance. It employs a super-ensemble of iterations small rFerns forests, each built on a subspace of size attributes, which is selected randomly, but with a higher selection probability for attributes claimed important by previous sub-models. Final selection is a group of attributes which hold a substantial weight at the end of the procedure.

Usage

naiveWrapper(
  x,
  y,
  iterations = 1000,
  depth = 5,
  ferns = 100,
  size = 30,
  lambda = 5,
  threads = 0,
  saveHistory = FALSE
)

Arguments

`x`	Data frame containing attributes; must have unique names and contain only numeric, integer or (ordered) factor columns. Factors must have less than 31 levels. No `NA` values are permitted.
`y`	A decision vector. Must a factor of the same length as `nrow(X)` for ordinary many-label classification, or a logical matrix with each column corresponding to a class for multi-label classification.
`iterations`	Number of iterations i.e., the number of sub-models built.
`depth`	The depth of the ferns; must be in 1–16 range. Note that time and memory requirements scale with `2^depth`.
`ferns`	Number of ferns to be build in each sub-model. This should be a small number, around 3-5 times `size`.
`size`	Number of attributes considered by each sub-model.
`lambda`	Lambda parameter driving the re-weighting step of the method.
`threads`	Number of parallel threads, copied to the underlying `rFerns` call.
`saveHistory`	Should weight history be stored.

Value

An object of class naiveWrapper, which is a list with the following components:

`found`	Names of all selected attributes.
`weights`	Vector of weights indicating the confidence that certain feature is relevant.
`timeTaken`	Time of computation.
`weightHistory`	History of weights over all iterations, present if `saveHistory` was `TRUE`.
`params`	Copies of algorithm parameters, `iterations`, `depth`, `ferns` and `size`, as a named vector.

References

Kursa MB (2017). Efficient all relevant feature selection with random ferns. In: Kryszkiewicz M., Appice A., Slezak D., Rybinski H., Skowron A., Ras Z. (eds) Foundations of Intelligent Systems. ISMIS 2017. Lecture Notes in Computer Science, vol 10352. Springer, Cham.

Examples

set.seed(77)
#Fetch Iris data
data(iris)
#Extend with random noise
noisyIris<-cbind(iris[,-5],apply(iris[,-5],2,sample))
names(noisyIris)[5:8]<-sprintf("Nonsense%d",1:4)
#Execute selection
naiveWrapper(noisyIris,iris$Species,iterations=50,ferns=20,size=8)

[Package rFerns version 5.0.0 Index]