cv.epx {EPX}R Documentation

Balanced K-fold cross-validation for an "epx" object

Description

Balanced K-fold cross-validation based on an "epx" object. Hence, we have biased cross-validation as we do not re-run the phalanx-formation algorithm for each fold.

Usage

cv.epx(
  epx,
  folds = NULL,
  K = 10,
  folds.out = FALSE,
  classifier.args = list(),
  performance.args = list(),
  ...
)

Arguments

epx

Object of class "epx".

folds

Optional vector specifying to which fold each observation belongs. Must be an n-length vector (n being the number of observations) with integer values only in the range from 1 to K.

K

Number of folds; default is 10.

folds.out

Indicates whether a vector indicating fold membership for each of the observations will be output; default is FALSE.

classifier.args

Arguments for the base classifier specified by epx; default is that used in epx formation.

performance.args

Arguments for the performance measure specified by epx; default is that used in epx formation.

...

Further arguments passed to or from other methods.

Value

An (n + 1) by (p + 1) matrix, where n is the number of observations used to train epx and p is the number of (final) phalanxes. Column p + 1 of the matrix contains the predicted probabilities of relevance from the ensemble of phalanxes, and row n + 1 is the performance (choice of performance measure determined by the "epx" object) of the corresponding column.

Setting folds.out as TRUE changes the output of cv.epx into a list of two elements:

EPX.CV

The (n + 1) by (p + 1) matrix returned by default when folds.out = FALSE.

FOLDS.USED

A vector of length n with integer values only in the range from 1 to K indicating to which fold each observation was randomly assigned for cross-validation.

Examples

# Example with data(harvest)

## Phalanx-formation using a base classifier with 50 trees (default = 500)
 
set.seed(761)
model <- epx(x = harvest[, -4], y = harvest[, 4],
            classifier.args = list(ntree = 50))

## 10-fold balanced cross-validation (different base classifier settings)
## Not run: 
set.seed(761)
cv.100 <- cv.epx(model, classifier.args = list(ntree = 100))
tail(cv.100) # see performance (here, AHR) for all phalanxes and the ensemble


## Option to output the vector assigning observations to the K folds
## (Commented out for speed.)
set.seed(761)
cv.folds <- cv.epx(model, folds.out = TRUE)
tail(cv.folds[[1]])  # same as first example
table(cv.folds[[2]])  # number of observations in each of the 10 folds

## 10 runs of 10-fold balanced cross-validation (using default settings)
set.seed(761)
cv.ahr <- NULL  # store AHR of each ensemble
for (i in 1:10) {
  cv.i <- cv.epx(model)
  cv.ahr <- c(cv.ahr, cv.i[nrow(cv.i), ncol(cv.i)])
}
boxplot(cv.ahr)  # to see variation in AHR

## End(Not run)


[Package EPX version 1.0.4 Index]