getImpLegacyRf {Boruta} | R Documentation |
randomForest importance adapters
Description
Those function is intended to be given to a getImp
argument of Boruta
function to be called by the Boruta algorithm as an importance source.
getImpLegacyRfZ
generates default, normalized permutation importance, getImpLegacyRfRaw
raw permutation importance, finally getImpLegacyRfGini
generates Gini index importance, all using randomForest
as a Random Forest algorithm implementation.
Usage
getImpLegacyRfZ(x, y, ...)
getImpLegacyRfRaw(x, y, ...)
getImpLegacyRfGini(x, y, ...)
Arguments
x |
data frame of predictors including shadows. |
y |
response vector. |
... |
parameters passed to the underlying |
Note
The getImpLegacyRfZ
function was a default importance source in Boruta versions prior to 5.0; since then ranger
Random Forest implementation is used instead of randomForest
, for speed, memory conservation and an ability to utilise multithreading.
Both importance sources should generally lead to the same results, yet there are differences.
Most notably, ranger by default treats factor attributes as ordered (and works very slow if instructed otherwise with respect.unordered.factors=TRUE
); on the other hand it lifts 32 levels limit specific to randomForest
.
To this end, Boruta decision for factor attributes may be different.
Random Forest methods has two main parameters, number of attributes tried at each split and the number of trees in the forest; first one is called mtry
in both implementations, but the second ntree
in randomForest
and num.trees
in ranger
.
To this end, to maintain compatibility, getImpRf*
functions still accept ntree
parameter relaying it into num.trees
.
Still, both parameters take the same defaults in both implementations (square root of the number all all attributes and 500 respectively).
Moreover, ranger
brings some addition capabilities to Boruta, like analysis of survival problems or sticky variables which are always considered on splits.
Finally, the results for the same PRNG seed will be different.
Examples
set.seed(777)
#Add some nonsense attributes to iris dataset by shuffling original attributes
iris.extended<-data.frame(iris,apply(iris[,-5],2,sample))
names(iris.extended)[6:9]<-paste("Nonsense",1:4,sep="")
#Run Boruta on this data
Boruta(Species~.,getImp=getImpLegacyRfZ,
data=iris.extended,doTrace=2)->Boruta.iris.extended
#Nonsense attributes should be rejected
print(Boruta.iris.extended)