binomialRF {binomialRF} | R Documentation |

## random forest feature selection based on binomial exact test

### Description

`binomialRF`

is the R implementation of the feature selection algorithm by (Zaim 2019)

### Usage

```
binomialRF(X,y, fdr.threshold = .05,fdr.method = 'BY',
ntrees = 2000, percent_features = .5,
keep.both=FALSE, user_cbinom_dist=NULL,
sampsize=round(nrow(X)*.63))
```

### Arguments

`X` |
design matrix |

`y` |
class label |

`fdr.threshold` |
fdr.threshold for determining which set of features are significant |

`fdr.method` |
how should we adjust for multiple comparisons (i.e., |

`ntrees` |
how many trees should be used to grow the |

`percent_features` |
what percentage of L do we subsample at each tree? Should be a proportion between (0,1) |

`keep.both` |
should we keep the naive binomialRF as well as the correlated adjustment |

`user_cbinom_dist` |
insert either a pre-specified correlated binomial distribution or calculate one via the R package |

`sampsize` |
how many samples should be included in each tree in the randomForest |

### Value

a data.frame with 4 columns: Feature Name, Frequency Selected, Probability of Selecting it randomly, Adjusted P-value based on `fdr.method`

### References

Zaim, SZ; Kenost, C.; Lussier, YA; Zhang, HH. binomialRF: Scalable Feature Selection and Screening for Random Forests to Identify Biomarkers and Their Interactions, bioRxiv, 2019.

### Examples

```
set.seed(324)
###############################
### Generate simulation data
###############################
X = matrix(rnorm(1000), ncol=10)
trueBeta= c(rep(10,5), rep(0,5))
z = 1 + X %*% trueBeta
pr = 1/(1+exp(-z))
y = as.factor(rbinom(100,1,pr))
###############################
### Run binomialRF
###############################
require(correlbinom)
rho = 0.33
ntrees = 250
cbinom = correlbinom(rho, successprob = calculateBinomialP(10, .5), trials = ntrees,
precision = 1024, model = 'kuk')
binom.rf <-binomialRF(X,y, fdr.threshold = .05,fdr.method = 'BY',
ntrees = ntrees,percent_features = .5,
keep.both=FALSE, user_cbinom_dist=cbinom,
sampsize=round(nrow(X)*rho))
print(binom.rf)
```

*binomialRF*version 0.1.0 Index]