my_ranger_PIMP {RFlocalfdr}R Documentation

my_ranger_PIMP based on the PIMP function from the vita package. ‘PIMP’ implements the test approach of Altmann et al. (2010) for the permutation variable importance measure ‘VarImp’ returned by the randomForest package (Liaw and Wiener (2002)) for classification and regression.

Description

my_PIMP applies the same method as PIMP but to the MDI (mean decrease in impurity) variable importance (mean decrease in Gini index for classification and mean decrease in MSE for regression). my_ranger_PIMP applies the same method to the ranger RF package

Usage

my_ranger_PIMP(
  X,
  y,
  rForest,
  S = 100,
  parallel = FALSE,
  ncores = 0,
  seed = 123,
  ...
)

Arguments

X

data matrix of size n by p

y

class labels for classification (factor) or real values for regression. Of length n

rForest

an object of class ranger, importance must be set to "impurity".

S

The number of permutations for the response vector ‘y’. Default is ‘S=100

parallel

Should the PIMP-algorithm run parallel? Default is ‘parallel=FALSE’ and the number of cores is set to one. The parallelized version of the PIMP-algorithm are based on mclapply and so is not available on Windows

ncores

The number of cores to use, i.e. at most how many child processes will be run simultaneously. Must be at least one, and parallelization requires at least two cores. If ‘ncores=0’, then the half of CPU cores on the current host are used.

seed

a single integer value to specify seeds. The "combined multiple-recursive generator" from L'Ecuyer (1999) is set as random number generator for the parallelized version of the PIMP-algorithm. Default is ‘ seed = 123’.

...

additional arguments passed to ranger

Value

an object of class PIMP

Examples

 
library(RFlocalfdr.data)
library(ranger)
library(vita) #vita: Variable Importance Testing Approaches
data(smoking)
?smoking 
y<-smoking$y
y<-factor(y)
smoking_data<-smoking$rma

cl.ranger <- ranger::ranger(y=y, x=smoking_data,mtry = 3,num.trees = 1000, importance = 'impurity')
system.time(pimp.varImp.cl<-my_ranger_PIMP(smoking_data,y,cl.ranger,S=10, parallel=TRUE, ncores=2))
#CRAN limits the number of cores available to packages to 2, for performance reasons.
pimp.t.cl <- vita::PimpTest(pimp.varImp.cl,para = FALSE)
aa <- summary(pimp.t.cl,pless = 0.05)
length(which(aa$cmat2[,"p-value"]< 0.05))
hist(aa$cmat2[,"p-value"],breaks=20)


[Package RFlocalfdr version 0.8.5 Index]