permimp {permimp} | R Documentation |
Random Forest Permutation Importance for random forests
Description
Standard and partial/conditional permutation importance for
random forest-objects fit using the party or randomForest
packages, following the permutation principle of the 'mean decrease in
accuracy' importance in randomForest . The partial/conditional permutation
importance is implemented differently, selecting the predictions to condition
on in each tree using Pearson Chi-squared tests applied to the
by-split point-categorized predictors. In general the new implementation has
similar results as the original varimp
function. With
asParty = TRUE
, the partial/conditional permutation importance is
fully backward-compatible but faster than the original varimp
function in party.
Usage
permimp(object, ...)
## S3 method for class 'randomForest'
permimp(object, nperm = 1, OOB = TRUE, scaled = FALSE,
conditional = FALSE, threshold = .95, whichxnames = NULL,
thresholdDiagnostics = FALSE, progressBar = TRUE, do_check = TRUE, ...)
## S3 method for class 'RandomForest'
permimp(object, nperm = 1, OOB = TRUE, scaled = FALSE,
conditional = FALSE, threshold = .95, whichxnames = NULL,
thresholdDiagnostics = FALSE, progressBar = TRUE,
pre1.0_0 = conditional, AUC = FALSE, asParty = FALSE, mincriterion = 0, ...)
Arguments
object |
an object as returned by |
mincriterion |
the value of the test statistic or 1 - p-value that
must be exceeded in order to include a split in the
computation of the importance. The default
|
conditional |
a logical that determines whether unconditional or conditional permutation is performed. |
threshold |
the threshold value for (1 - p-value) of the association
between the predictor of interest and another predictor, which
must be exceeded in order to include the other predictor in
the conditioning scheme for the predictor of interest (only
relevant if |
nperm |
the number of permutations performed. |
OOB |
a logical that determines whether the importance is computed from the out-of-bag sample or the learning sample (not suggested). |
pre1.0_0 |
Prior to party version 1.0-0, the actual data values were permuted according to the original permutation importance suggested by Breiman (2001). Now the assignments to child nodes of splits in the variable of interest are permuted as described by Hapfelmeier et al. (2012), which allows for missing values in the predictors and is more efficient with respect to memory consumption and computing time. This method does not apply to the conditional permutation importance, nor to random forests that were not fit using the party package. |
scaled |
a logical that determines whether the differences in prediction accuracy should be scaled by the total (null-model) error. |
AUC |
a logical that determines whether the Area Under the Curve (AUC) instead of the accuracy is used to compute the permutation importance (cf. Janitza et al., 2012). The AUC-based permutation importance is more robust towards class imbalance, but it is only applicable to binary classification. |
asParty |
a logical that determines whether or not exactly the same
values as the original |
whichxnames |
a character vector containing the predictor variable names for which the permutation importance should be computed. Only use when aware of the implications, see section 'Details'. |
thresholdDiagnostics |
a logical that specifies whether diagnostics with respect to the threshold-value should be prompted as warnings. |
progressBar |
a logical that determines whether a progress bar should be displayed. |
do_check |
a logical that determines whether a check requiring user input should be included. |
... |
additional arguments to be passed to the Methods |
Details
Function permimp
is highly comparable to varimp
in party,
but the partial/conditional variable importance has a different, more efficient
implementation. Compared to the original varimp
in party,
permimp
applies a different strategy to select the predictors to condition
on (ADD REFERENCE TO PAPER).
With asParty = TRUE
, permimp returns exactly the same values as
varimp
in party, but the computation is done more efficiently.
If conditional = TRUE
, the importance of each variable is computed by
permuting within a grid defined by the predictors that are associated
(with 1 - p-value greater than threshold
) to the variable of interest.
The threshold
can be interpreted as a parameter that moves the permutation
importance across a dimension from fully conditional (threshold = 0
) to
completely unconditional (threshold = 1
), see Debeer and Strobl (2020).
Using the wichxnames
argument, the computation of the permutation importance
can be limited to a smaller number of specified predictors. Note, however, that when
conditional = TRUE
, the (other) predictors to condition on are also
limited to this selection of predictors. Only use when fully aware of the
implications.
For further details, please refer to the documentation of varimp
.
Value
An object of class VarImp
, with the mean decrease in accuracy
as its $values
.
References
Leo Breiman (2001). Random Forests. Machine Learning, 45(1), 5–32.
Alexander Hapfelmeier, Torsten Hothorn, Kurt Ulm, and Carolin Strobl (2012). A New Variable Importance Measure for Random Forests with Missing Data. Statistics and Computing, https://link.springer.com/article/10.1007/s11222-012-9349-1
Torsten Hothorn, Kurt Hornik, and Achim Zeileis (2006b). Unbiased Recursive Partitioning: A Conditional Inference Framework. Journal of Computational and Graphical Statistics, 15 (3), 651-674. Preprint available from https://www.zeileis.org/papers/Hothorn+Hornik+Zeileis-2006.pdf
Silke Janitza, Carolin Strobl and Anne-Laure Boulesteix (2013). An AUC-based Permutation Variable Importance Measure for Random Forests. BMC Bioinformatics.2013, 14 119. https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-14-119
Carolin Strobl, Anne-Laure Boulesteix, Thomas Kneib, Thomas Augustin, and Achim Zeileis (2008). Conditional Variable Importance for Random Forests. BMC Bioinformatics, 9, 307. https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-9-307
Debeer Dries and Carolin Strobl (2020). Conditional Permutation Importance Revisited. BMC Bioinformatics, 21, 307. https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-020-03622-2
See Also
Examples
### for RandomForest-objects, by party::cforest()
set.seed(290875)
readingSkills.cf <- party::cforest(score ~ ., data = party::readingSkills,
control = party::cforest_unbiased(mtry = 2, ntree = 25))
### conditional importance, may take a while...
# party implementation:
set.seed(290875)
party::varimp(readingSkills.cf, conditional = TRUE)
# faster implementation but same results
set.seed(290875)
permimp(readingSkills.cf, conditional = TRUE, asParty = TRUE)
# different implementation with similar results
set.seed(290875)
permimp(readingSkills.cf, conditional = TRUE, asParty = FALSE)
### standard (unconditional) importance is unchanged
set.seed(290875)
party::varimp(readingSkills.cf)
set.seed(290875)
permimp(readingSkills.cf)
###
set.seed(290875)
readingSkills.rf <- randomForest::randomForest(score ~ ., data = party::readingSkills,
mtry = 2, ntree = 25, importance = TRUE,
keep.forest = TRUE, keep.inbag = TRUE)
### (unconditional) Permutation Importance
set.seed(290875)
permimp(readingSkills.rf, do_check = FALSE)
# very close to
readingSkills.rf$importance[,1]
### Conditional Permutation Importance
set.seed(290875)
permimp(readingSkills.rf, conditional = TRUE, threshold = .8, do_check = FALSE)