R: randomForest importance adapters

getImpLegacyRf {Boruta}

R Documentation

randomForest importance adapters

Description

Those function is intended to be given to a getImp argument of Boruta function to be called by the Boruta algorithm as an importance source. getImpLegacyRfZ generates default, normalized permutation importance, getImpLegacyRfRaw raw permutation importance, finally getImpLegacyRfGini generates Gini index importance, all using randomForest as a Random Forest algorithm implementation.

Usage

getImpLegacyRfZ(x, y, ...)

getImpLegacyRfRaw(x, y, ...)

getImpLegacyRfGini(x, y, ...)

Arguments

`x`	data frame of predictors including shadows.
`y`	response vector.
`...`	parameters passed to the underlying `randomForest` call; they are relayed from `...` of `Boruta`.

Note

The getImpLegacyRfZ function was a default importance source in Boruta versions prior to 5.0; since then ranger Random Forest implementation is used instead of randomForest, for speed, memory conservation and an ability to utilise multithreading. Both importance sources should generally lead to the same results, yet there are differences.

Most notably, ranger by default treats factor attributes as ordered (and works very slow if instructed otherwise with respect.unordered.factors=TRUE); on the other hand it lifts 32 levels limit specific to randomForest. To this end, Boruta decision for factor attributes may be different.

Random Forest methods has two main parameters, number of attributes tried at each split and the number of trees in the forest; first one is called mtry in both implementations, but the second ntree in randomForest and num.trees in ranger. To this end, to maintain compatibility, getImpRf* functions still accept ntree parameter relaying it into num.trees. Still, both parameters take the same defaults in both implementations (square root of the number all all attributes and 500 respectively).

Moreover, ranger brings some addition capabilities to Boruta, like analysis of survival problems or sticky variables which are always considered on splits.

Finally, the results for the same PRNG seed will be different.

Examples

set.seed(777)
#Add some nonsense attributes to iris dataset by shuffling original attributes
iris.extended<-data.frame(iris,apply(iris[,-5],2,sample))
names(iris.extended)[6:9]<-paste("Nonsense",1:4,sep="")
#Run Boruta on this data
Boruta(Species~.,getImp=getImpLegacyRfZ,
 data=iris.extended,doTrace=2)->Boruta.iris.extended
#Nonsense attributes should be rejected
print(Boruta.iris.extended)

[Package Boruta version 8.0.0 Index]