stat.random_forest {knockoff} | R Documentation |
Importance statistics based on random forests
Description
Computes the difference statistic
W_j = |Z_j| - |\tilde{Z}_j|
where Z_j
and \tilde{Z}_j
are the random forest feature importances
of the jth variable and its knockoff, respectively.
Usage
stat.random_forest(X, X_k, y, ...)
Arguments
X |
n-by-p matrix of original variables. |
X_k |
n-by-p matrix of knockoff variables. |
y |
vector of length n, containing the response variables. If a factor, classification is assumed, otherwise regression is assumed. |
... |
additional arguments specific to |
Details
This function uses the ranger
package to compute variable
importance measures. The importance of a variable is measured as the total decrease
in node impurities from splitting on that variable, averaged over all trees.
For regression, the node impurity is measured by residual sum of squares.
For classification, it is measured by the Gini index.
For a complete list of the available additional arguments, see ranger
.
Value
A vector of statistics W
of length p.
See Also
Other statistics:
stat.forward_selection()
,
stat.glmnet_coefdiff()
,
stat.glmnet_lambdadiff()
,
stat.lasso_coefdiff_bin()
,
stat.lasso_coefdiff()
,
stat.lasso_lambdadiff_bin()
,
stat.lasso_lambdadiff()
,
stat.sqrt_lasso()
,
stat.stability_selection()
Examples
set.seed(2022)
p=200; n=100; k=15
mu = rep(0,p); Sigma = diag(p)
X = matrix(rnorm(n*p),n)
nonzero = sample(p, k)
beta = 3.5 * (1:p %in% nonzero)
y = X %*% beta + rnorm(n)
knockoffs = function(X) create.gaussian(X, mu, Sigma)
# Basic usage with default arguments
result = knockoff.filter(X, y, knockoffs=knockoffs,
statistic=stat.random_forest)
print(result$selected)
# Advanced usage with custom arguments
foo = stat.random_forest
k_stat = function(X, X_k, y) foo(X, X_k, y, nodesize=5)
result = knockoff.filter(X, y, knockoffs=knockoffs, statistic=k_stat)
print(result$selected)