k_binomialRF {binomialRF} | R Documentation |
random forest feature selection based on binomial exact test
Description
k_binomialRF
is the R implementation of the interaction feature selection algorithm by (Zaim 2019). k_binomialRF
extends the binomialRF
algorithm by searching for k-way interactions.
Usage
k_binomialRF(X, y, fdr.threshold = 0.05, fdr.method = "BY",
ntrees = 2000, percent_features = 0.3, K = 2, cbinom_dist = NULL,
sampsize = nrow(X) * 0.4)
Arguments
X |
design matrix |
y |
class label |
fdr.threshold |
fdr.threshold for determining which set of features are significant |
fdr.method |
how should we adjust for multiple comparisons (i.e., |
ntrees |
how many trees should be used to grow the |
percent_features |
what percentage of L do we subsample at each tree? Should be a proportion between (0,1) |
K |
for multi-way interactions, how deep should the interactions be? |
cbinom_dist |
user-supplied correlated binomial distribution |
sampsize |
user-supplied sample size for random forest |
Value
a data.frame with 4 columns: Feature Name, Frequency Selected, Probability of Selecting it randomly, Adjusted P-value based on fdr.method
References
Zaim, SZ; Kenost, C.; Lussier, YA; Zhang, HH. binomialRF: Scalable Feature Selection and Screening for Random Forests to Identify Biomarkers and Their Interactions, bioRxiv, 2019.
Examples
set.seed(324)
###############################
### Generate simulation data
###############################
X = matrix(rnorm(1000), ncol=10)
trueBeta= c(rep(10,5), rep(0,5))
z = 1 + X %*% trueBeta
pr = 1/(1+exp(-z))
y = rbinom(100,1,pr)
###############################
### Run interaction model
###############################
require(correlbinom)
rho = 0.33
ntrees = 250
cbinom = correlbinom(rho, successprob = calculateBinomialP_Interaction(10, .5,2),
trials = ntrees, precision = 1024, model = 'kuk')
k.binom.rf <-k_binomialRF(X,y, fdr.threshold = .05,fdr.method = 'BY',
ntrees = ntrees,percent_features = .5,
cbinom_dist=cbinom,
sampsize=round(nrow(X)*rho))