cv.epx {EPX} | R Documentation |
Balanced K-fold cross-validation for an "epx
" object
Description
Balanced K-fold cross-validation based on an "epx
" object.
Hence, we have biased cross-validation as we do not re-run the
phalanx-formation algorithm for each fold.
Usage
cv.epx(
epx,
folds = NULL,
K = 10,
folds.out = FALSE,
classifier.args = list(),
performance.args = list(),
...
)
Arguments
epx |
Object of class " |
folds |
Optional vector specifying to which fold each observation belongs. Must be an |
K |
Number of folds; default is 10. |
folds.out |
Indicates whether a vector indicating fold membership for
each of the observations will be output; default is |
classifier.args |
Arguments for the base classifier specified by
|
performance.args |
Arguments for the performance measure specified by
|
... |
Further arguments passed to or from other methods. |
Value
An (n + 1)
by (p + 1)
matrix, where n
is the number
of observations used to train epx
and p
is the number of
(final) phalanxes. Column p + 1
of the matrix contains the predicted
probabilities of relevance from the ensemble of phalanxes,
and row n + 1
is the performance (choice of performance measure determined by the
"epx
" object) of the corresponding column.
Setting folds.out
as TRUE
changes the output of
cv.epx
into a list of two elements:
EPX.CV |
The |
FOLDS.USED |
A vector of length |
Examples
# Example with data(harvest)
## Phalanx-formation using a base classifier with 50 trees (default = 500)
set.seed(761)
model <- epx(x = harvest[, -4], y = harvest[, 4],
classifier.args = list(ntree = 50))
## 10-fold balanced cross-validation (different base classifier settings)
## Not run:
set.seed(761)
cv.100 <- cv.epx(model, classifier.args = list(ntree = 100))
tail(cv.100) # see performance (here, AHR) for all phalanxes and the ensemble
## Option to output the vector assigning observations to the K folds
## (Commented out for speed.)
set.seed(761)
cv.folds <- cv.epx(model, folds.out = TRUE)
tail(cv.folds[[1]]) # same as first example
table(cv.folds[[2]]) # number of observations in each of the 10 folds
## 10 runs of 10-fold balanced cross-validation (using default settings)
set.seed(761)
cv.ahr <- NULL # store AHR of each ensemble
for (i in 1:10) {
cv.i <- cv.epx(model)
cv.ahr <- c(cv.ahr, cv.i[nrow(cv.i), ncol(cv.i)])
}
boxplot(cv.ahr) # to see variation in AHR
## End(Not run)