| allinone {rfvimptest} | R Documentation |
Apply all available (sequential) permutation testing approaches of variable importance measures with one function call
Description
This is a helper function, which allows to perform all (sequential) permutation testing approaches of variable importance measures described in rfvimptest
with a single function call. This may be useful for comparing the results obtained using the different approaches.
Importantly, this function is computationally efficient by re-using the permuted variable importance values obtained
for the conventional permutation test (that performs all Mmax permutations) for the other approaches. For details
on the different approaches see rfvimptest.
Usage
allinone(
data,
yname,
Mmax = 500,
varnames = NULL,
p0 = 0.06,
p1 = 0.04,
alpha = 0.05,
beta = 0.2,
A = 0.1,
B = 10,
h = 8,
nperm = 1,
ntree = 500,
progressbar = TRUE,
condinf = FALSE,
...
)
Arguments
data |
A |
yname |
Name of outcome variable. |
Mmax |
Maximum number of permutations used in each permutation test. Default is 500. |
varnames |
Optional. Names of the variables for which testing should be performed. By default all variables in |
p0 |
The value of the p-value in the null hypothesis (H0: p = p0) of SPRT and SAPT. Default is 0.06. |
p1 |
The value of the p-value in the alternative hypothesis (H1: p = p1) of SPRT and SAPT. Default is 0.04. |
alpha |
The significance level of SPRT when p = p0. Also known as type I error. Default is 0.05. |
beta |
One minus the power of SPRT when p = p1. Also known as type II error. Default is 0.2. |
A |
The quantity A in the formula of SAPT. Default is 0.1 for a type I error of 0.05. Usually not changed by the user. |
B |
The quantity B in the formula of SAPT. Default is 10 (1/A) for a type I error of 0.05. Usually not changed by the user. |
h |
The quantity h in the formula for the sequential Monte Carlo p-value. The default value for h is 8. Larger values lead to more precise p-value estimates, but are computationally more expensive. |
nperm |
The numbers of permutations of the out-of-bag observations over which the results are averaged, when calculating the variable importance measure values. Default is 1. Larger values than 1 can only be considered when |
ntree |
Number of trees per forest. Default is 500. |
progressbar |
Output the current progress of the calculations for each variable to the console? Default is TRUE. |
condinf |
Set this value to |
... |
Further arguments passed to |
Value
Object of class allinone with elements
varimp |
Variable importance for each considered independent variable. |
testres |
The results ("keep H0" vs. "accept H1") of the tests for each considered independent variable. |
pvalues |
The p-values of the tests for each considered independent variable. Note that p-values are only obtained for the method types "pval" and "complete". |
stoppedearly |
For each independent variable, whether the calculations stopped early ("yes") or the maximum of |
perms |
The number of permutations performed for each independent variable. |
Mmax |
Maximum number of permutations used in each permutation test. |
ntree |
Number of trees per forest. |
comptime |
The time the computations needed. |
Author(s)
Alexander Hapfelmeier, Roman Hornung
References
Breiman, L. (2001). Random forests. Mach Learn, 45:5-32, <doi: 10.1023/A:1010933404324>.
Coleman, T., Peng, W., Mentch, L. (2019). Scalable and efficient hypothesis testing with random forests. arXiv preprint arXiv:1904.07830, <doi: 10.48550/arXiv.1904.07830>.
Hapfelmeier, A., Hornung, R., Haller, B. (2022). Sequential Permutation Testing of Random Forest Variable Importance Measures. arXiv preprint arXiv:2206.01284, <doi: 10.48550/arXiv.2206.01284>.
Hapfelmeier, A., Ulm, K. (2013). A new variable selection approach using Random Forests. CSDA 60:50–69, <doi: 10.1016/j.csda.2012.09.020>.
Hapfelmeier, A., Hothorn, T., Ulm, K., Strobl, C. (2014). A new variable importance measure for random forests with missing data. Stat Comput 24:21–34, <doi: 10.1007/s11222-012-9349-1>.
Hothorn, T., Hornik, K., Zeileis, A. (2006). Unbiased Recursive Partitioning: A Conditional Inference Framework. J Comput Graph Stat 15(3):651–674, <doi: 10.1198/106186006X133933>.
Wright, M. N., Ziegler, A. (2017). ranger: A fast implementation of random forests for high dimensional data in C++ and R. J Stat Softw 77:1-17, <doi: 10.18637/jss.v077.i01>.
See Also
Examples
# Load package:
library("rfvimptest")
# Set seed to obtain reproducible results:
set.seed(1234)
# Load example data:
data(hearth2)
# NOTE: For illustration purposes very small numbers of maximum
# permutations are considered in the below examples.
# This number would be much too small for actual applications.
# The default number is Max=500.
# When using condinf=FALSE (default) the results for the two-sample
# permutation tests are not obtained:
(ptest <- allinone(data=hearth2, yname="Class", Mmax=20))
# Variable importance values with p-values from the Monte Carlo p-value
# and the complete approach:
ptest$varimp
ptest$pvalues$pval
ptest$pvalues$complete
# When setting condinf=TRUE the results are obtained for all approaches,
# that is, including those for the two-sample permutation tests
# (in this illustration very small number of trees ntree=30 are used,
# in practice much larger numbers should be used; the default is ntree=500):
(ptest_ci <- allinone(data=hearth2, yname="Class", condinf=TRUE, ntree=30, Mmax=10))
ptest_ci$testres