epx {EPX} | R Documentation |
Fitting an Ensemble of Phalanxes
Description
epx
forms phalanxes of variables from training data for
binary classification with a rare class. The phalanxes are
disjoint subsets of variables, each of which is fit with a base classifier.
Together they form an ensemble.
Usage
epx(
x,
y,
phalanxes.initial = c(1:ncol(x)),
alpha = 0.95,
nsim = 1000,
rmin.target = 1,
classifier = "random forest",
classifier.args = list(),
performance = "AHR",
performance.args = list(),
computing = "sequential",
...
)
Arguments
x |
Explanatory variables (predictors, features) contained in a data frame. |
y |
Binary response variable vector (numeric or integer): 1 for the rare class, 0 for the majority class. |
phalanxes.initial |
Initial variable group indices; default one group per variable. Example: vector c(1, 1, 2, 2, 3, ...) puts variables 1 and 2 in group 1, variables 3 and 4 in group, 2, etc. Indices cannot be skipped, e.g., c( 1, 3, 3, 4, 4, 3, 1) skips group 2 and is invalid. |
alpha |
Lower-tail probability for the critical quantile of the reference
distribution of the |
nsim |
Number of simulations for the reference empirical distribution of the performance measure; default is 1000. |
rmin.target |
To merge the pair of groups with the
minimum ratio of performance measures (ensemble of models to single model)
into a single group their ratio must be less than
|
classifier |
Base classifier, one of
|
classifier.args |
Arguments for the base |
performance |
Performance assessment metric, one of
|
performance.args |
Arguments for the |
computing |
Whether to compute sequentially or in parallel. Input is one
of |
... |
Further arguments passed to or from other methods. |
Details
Please see Tomal et al. (2015) for more description of phalanx formation.
Value
Returns an object of class epx
, which is
a list containing the following components:
PHALANXES |
List of four vectors, each the same length as the number of
explanatory variables (columns in |
PHALANXES.FINAL.PERFORMANCE |
Vector of |
PHALANXES.FINAL.FITS |
A matrix with number of rows equal to the number
of observations in the training data and number of columns equal to the
number of final phalanxes. Column |
ENSEMBLED.FITS |
The predicted probabilities of class 1 from the
ensemble of phalanxes based on |
BASE.CLASSIFIER.ARGS |
(Parsed) record of user-specified arguments for
|
PERFORMANCE.ARGS |
(Parsed) record of user-specified arguments for
|
X |
User-provided data frame of explanatory variables. |
Y |
User-provided binary response vector. |
References
Tomal, J. H., Welch, W. J., & Zamar, R. H. (2015). Ensembling classification models based on phalanxes of variables with applications in drug discovery. The Annals of Applied Statistics, 9(1), 69-93. doi: 10.1214/14-AOAS778
See Also
summary.epx
prints a summary of the results,
and cv.epx
assesses performance via cross-validation.
Examples
# Example with data(harvest)
## Phalanx-formation using a base classifier with 50 trees (default = 500)
set.seed(761)
model <- epx(x = harvest[, -4], y = harvest[, 4],
classifier.args = list(ntree = 50))
## Phalanx-membership of explanatory variables at the four stages
## of phalanx formation (0 means not in a phalanx)
model$PHALANXES
## Summary of the final phalanxes (matches above)
summary(model)
## Not run:
## Parallel computing
clusters <- parallel::detectCores()
cl <- parallel::makeCluster(clusters)
doParallel::registerDoParallel(cl)
set.seed(761)
model.par <- epx(x = harvest[, -4], y = harvest[, 4],
computing = "parallel")
parallel::stopCluster(cl)
## End(Not run)