cv.epx {EPX} | R Documentation |
Balanced K-fold cross-validation for an "epx
" object
Description
Balanced K-fold cross-validation based on an "epx
" object.
Hence, we have biased cross-validation as we do not re-run the
phalanx-formation algorithm for each fold.
Usage
cv.epx(
epx,
folds = NULL,
K = 10,
folds.out = FALSE,
classifier.args = list(),
performance.args = list(),
...
)
Arguments
epx |
Object of class " |
folds |
Optional vector specifying to which fold each observation belongs. Must be an |
K |
Number of folds; default is 10. |
folds.out |
Indicates whether a vector indicating fold membership for
each of the observations will be output; default is |
classifier.args |
Arguments for the base classifier specified by
|
performance.args |
Arguments for the performance measure specified by
|
... |
Further arguments passed to or from other methods. |
Value
An by
matrix, where
is the number
of observations used to train
epx
and is the number of
(final) phalanxes. Column
of the matrix contains the predicted
probabilities of relevance from the ensemble of phalanxes,
and row
is the performance (choice of performance measure determined by the
"
epx
" object) of the corresponding column.
Setting folds.out
as TRUE
changes the output of
cv.epx
into a list of two elements:
EPX.CV |
The |
FOLDS.USED |
A vector of length |
Examples
# Example with data(harvest)
## Phalanx-formation using a base classifier with 50 trees (default = 500)
set.seed(761)
model <- epx(x = harvest[, -4], y = harvest[, 4],
classifier.args = list(ntree = 50))
## 10-fold balanced cross-validation (different base classifier settings)
## Not run:
set.seed(761)
cv.100 <- cv.epx(model, classifier.args = list(ntree = 100))
tail(cv.100) # see performance (here, AHR) for all phalanxes and the ensemble
## Option to output the vector assigning observations to the K folds
## (Commented out for speed.)
set.seed(761)
cv.folds <- cv.epx(model, folds.out = TRUE)
tail(cv.folds[[1]]) # same as first example
table(cv.folds[[2]]) # number of observations in each of the 10 folds
## 10 runs of 10-fold balanced cross-validation (using default settings)
set.seed(761)
cv.ahr <- NULL # store AHR of each ensemble
for (i in 1:10) {
cv.i <- cv.epx(model)
cv.ahr <- c(cv.ahr, cv.i[nrow(cv.i), ncol(cv.i)])
}
boxplot(cv.ahr) # to see variation in AHR
## End(Not run)