PIMP {vita} | R Documentation |
PIMP-algorithm for the permutation variable importance measure
Description
PIMP
implements the test approach of Altmann et al. (2010) for the permutation variable importance measure VarImp
in a random forest for classification and regression.
Usage
## Default S3 method:
PIMP(X, y, rForest, S = 100, parallel = FALSE, ncores=0, seed = 123, ...)
## S3 method for class 'PIMP'
print(x, ...)
Arguments
X |
a data frame or a matrix of predictors |
y |
a response vector. If a factor, classification is assumed, otherwise regression is assumed. |
rForest |
an object of class |
S |
The number of permutations for the response vector |
parallel |
Should the PIMP-algorithm run parallel? Default is |
ncores |
The number of cores to use, i.e. at most how many child processes will be run
simultaneously. Must be at least one, and parallelization requires at least two cores.
If |
seed |
a single integer value to specify seeds. The "combined multiple-recursive generator"
from L'Ecuyer (1999) is set as random number generator for the parallelized version of
the PIMP-algorithm. Default is |
... |
optional parameters for |
x |
for the print method, an |
Details
The PIMP-algorithm by Altmann et al. (2010) permutes S
times the response variable y
.
For each permutation of the response vector y^{*s}
, a new forest is grown and the permutation
variable importance measure (VarImp^{*s}
) for all predictor variables X
is computed.
The vector perVarImp
of S
VarImp
measures for every predictor variables are used
to approximate the null importance distributions (PimpTest
).
Value
VarImp |
the original permutation variable importance measures of the random forest. |
PerVarImp |
a matrix, where each row is a vector containing the |
type |
one of regression, classification |
References
Breiman L. (2001), Random Forests, Machine Learning 45(1),5-32, <doi:10.1023/A:1010933404324>
Altmann A.,Tolosi L., Sander O. and Lengauer T. (2010),Permutation importance: a corrected feature importance measure, Bioinformatics Volume 26 (10), 1340-1347, <doi:10.1093/bioinformatics/btq134>
See Also
PimpTest
, importance
, randomForest
, mclapply
Examples
###############################
# Regression #
##############################
##############################
## Simulating data
X = replicate(12,rnorm(100))
X = data.frame(X) #"X" can also be a matrix
y = with(X,2*X1 + 1*X2 + 2*X3 + 1*X4 - 2*X5 - 1*X6 - 1*X7 + 2*X8 )
##############################
## Regression with Random Forest:
library("randomForest")
reg.rf = randomForest(X,y,mtry = 3,ntree=500,importance=TRUE)
##############################
## PIMP-Permutation variable importance measure
# the parallelized version of the PIMP-algorithm
system.time(pimp.varImp.reg<-PIMP(X,y,reg.rf,S=10, parallel=TRUE, ncores=2))
# the non parallelized version of the PIMP-algorithm
system.time(pimp.varImp.reg<-PIMP(X,y,reg.rf,S=10, parallel=FALSE))
##############################
# Classification #
##############################
## Simulating data
X = replicate(12,rnorm(100))
X= data.frame( X) #"X" can also be a matrix
z = with(X,2*X1 + 3*X2 + 2*X3 + 1*X4 -
2*X5 - 2*X6 - 2*X7 + 1*X8 )
pr = 1/(1+exp(-z)) # pass through an inv-logit function
y = as.factor(rbinom(100,1,pr))
##############################
## Classification with Random Forest:
cl.rf = randomForest(X,y,mtry = 3,ntree = 500, importance = TRUE)
##############################
## PIMP-Permutation variable importance measure
# the parallelized version of the PIMP-algorithm
system.time(pimp.varImp.cl<-PIMP(X,y,cl.rf,S=10, parallel=TRUE, ncores=2))
# the non parallelized version of the PIMP-algorithm
system.time(pimp.varImp.cl<-PIMP(X,y,cl.rf,S=10, parallel=FALSE))