allinone {rfvimptest} | R Documentation |
Apply all available (sequential) permutation testing approaches of variable importance measures with one function call
Description
This is a helper function, which allows to perform all (sequential) permutation testing approaches of variable importance measures described in rfvimptest
with a single function call. This may be useful for comparing the results obtained using the different approaches.
Importantly, this function is computationally efficient by re-using the permuted variable importance values obtained
for the conventional permutation test (that performs all Mmax
permutations) for the other approaches. For details
on the different approaches see rfvimptest
.
Usage
allinone(
data,
yname,
Mmax = 500,
varnames = NULL,
p0 = 0.06,
p1 = 0.04,
alpha = 0.05,
beta = 0.2,
A = 0.1,
B = 10,
h = 8,
nperm = 1,
ntree = 500,
progressbar = TRUE,
condinf = FALSE,
...
)
Arguments
data |
A |
yname |
Name of outcome variable. |
Mmax |
Maximum number of permutations used in each permutation test. Default is 500. |
varnames |
Optional. Names of the variables for which testing should be performed. By default all variables in |
p0 |
The value of the p-value in the null hypothesis (H0: p = p0) of SPRT and SAPT. Default is 0.06. |
p1 |
The value of the p-value in the alternative hypothesis (H1: p = p1) of SPRT and SAPT. Default is 0.04. |
alpha |
The significance level of SPRT when p = p0. Also known as type I error. Default is 0.05. |
beta |
One minus the power of SPRT when p = p1. Also known as type II error. Default is 0.2. |
A |
The quantity A in the formula of SAPT. Default is 0.1 for a type I error of 0.05. Usually not changed by the user. |
B |
The quantity B in the formula of SAPT. Default is 10 (1/A) for a type I error of 0.05. Usually not changed by the user. |
h |
The quantity h in the formula for the sequential Monte Carlo p-value. The default value for h is 8. Larger values lead to more precise p-value estimates, but are computationally more expensive. |
nperm |
The numbers of permutations of the out-of-bag observations over which the results are averaged, when calculating the variable importance measure values. Default is 1. Larger values than 1 can only be considered when |
ntree |
Number of trees per forest. Default is 500. |
progressbar |
Output the current progress of the calculations for each variable to the console? Default is TRUE. |
condinf |
Set this value to |
... |
Further arguments passed to |
Value
Object of class allinone
with elements
varimp |
Variable importance for each considered independent variable. |
testres |
The results ("keep H0" vs. "accept H1") of the tests for each considered independent variable. |
pvalues |
The p-values of the tests for each considered independent variable. Note that p-values are only obtained for the method types "pval" and "complete". |
stoppedearly |
For each independent variable, whether the calculations stopped early ("yes") or the maximum of |
perms |
The number of permutations performed for each independent variable. |
Mmax |
Maximum number of permutations used in each permutation test. |
ntree |
Number of trees per forest. |
comptime |
The time the computations needed. |
Author(s)
Alexander Hapfelmeier, Roman Hornung
References
Breiman, L. (2001). Random forests. Mach Learn, 45:5-32, <doi: 10.1023/A:1010933404324>.
Coleman, T., Peng, W., Mentch, L. (2019). Scalable and efficient hypothesis testing with random forests. arXiv preprint arXiv:1904.07830, <doi: 10.48550/arXiv.1904.07830>.
Hapfelmeier, A., Hornung, R., Haller, B. (2022). Sequential Permutation Testing of Random Forest Variable Importance Measures. arXiv preprint arXiv:2206.01284, <doi: 10.48550/arXiv.2206.01284>.
Hapfelmeier, A., Ulm, K. (2013). A new variable selection approach using Random Forests. CSDA 60:50–69, <doi: 10.1016/j.csda.2012.09.020>.
Hapfelmeier, A., Hothorn, T., Ulm, K., Strobl, C. (2014). A new variable importance measure for random forests with missing data. Stat Comput 24:21–34, <doi: 10.1007/s11222-012-9349-1>.
Hothorn, T., Hornik, K., Zeileis, A. (2006). Unbiased Recursive Partitioning: A Conditional Inference Framework. J Comput Graph Stat 15(3):651–674, <doi: 10.1198/106186006X133933>.
Wright, M. N., Ziegler, A. (2017). ranger: A fast implementation of random forests for high dimensional data in C++ and R. J Stat Softw 77:1-17, <doi: 10.18637/jss.v077.i01>.
See Also
Examples
# Load package:
library("rfvimptest")
# Set seed to obtain reproducible results:
set.seed(1234)
# Load example data:
data(hearth2)
# NOTE: For illustration purposes very small numbers of maximum
# permutations are considered in the below examples.
# This number would be much too small for actual applications.
# The default number is Max=500.
# When using condinf=FALSE (default) the results for the two-sample
# permutation tests are not obtained:
(ptest <- allinone(data=hearth2, yname="Class", Mmax=20))
# Variable importance values with p-values from the Monte Carlo p-value
# and the complete approach:
ptest$varimp
ptest$pvalues$pval
ptest$pvalues$complete
# When setting condinf=TRUE the results are obtained for all approaches,
# that is, including those for the two-sample permutation tests
# (in this illustration very small number of trees ntree=30 are used,
# in practice much larger numbers should be used; the default is ntree=500):
(ptest_ci <- allinone(data=hearth2, yname="Class", condinf=TRUE, ntree=30, Mmax=10))
ptest_ci$testres