GAMens.cv {GAMens} | R Documentation |
Runs v-fold cross validation with GAMbag, GAMrsm or GAMens ensemble classifier
Description
In v-fold cross validation, the data are divided into v
subsets of
approximately equal size. Subsequently, one of the v
data parts is
excluded while the remainder of the data is used to create a GAMens
object. Predictions are generated for the excluded data part. The process
is repeated v
times.
Usage
GAMens.cv(formula, data, cv, rsm_size = 2, autoform = FALSE, iter = 10,
df = 4, bagging = TRUE, rsm = TRUE, fusion = "avgagg")
Arguments
formula |
a formula, as in the |
data |
a data frame in which to interpret the variables named in
|
cv |
An integer specifying the number of folds in the cross-validation. |
rsm_size |
an integer, the number of variables to use for random
feature subsets used in the Random Subspace Method. Default is 2. If
|
autoform |
if |
iter |
an integer, the number of base (member) classifiers (GAMs) in
the ensemble. Defaults to |
df |
an integer, the number of degrees of freedom (df) used for
smoothing spline estimation. Its value is only used when |
bagging |
enables Bagging if value is |
rsm |
enables Random Subspace Method (RSM) if value is |
fusion |
specifies the fusion rule for the aggregation of member
classifier outputs in the ensemble. Possible values are |
Value
An object of class GAMens.cv
, which is a list with the
following components:
foldpred |
a data frame with, per fold, predicted class membership probabilities for the left-out observations. |
pred |
a data frame with predicted class membership probabilities. |
foldclass |
a data frame with, per fold, predicted classes for the left-out observations. |
class |
a data frame with predicted classes. |
conf |
the confusion matrix which compares the real versus predicted
class memberships, based on the |
Author(s)
Koen W. De Bock kdebock@audencia.com, Kristof Coussement K.Coussement@ieseg.fr and Dirk Van den Poel Dirk.VandenPoel@ugent.be
References
De Bock, K.W. and Van den Poel, D. (2012): "Reconciling Performance and Interpretability in Customer Churn Prediction Modeling Using Ensemble Learning Based on Generalized Additive Models". Expert Systems With Applications, Vol 39, 8, pp. 6816–6826.
De Bock, K. W., Coussement, K. and Van den Poel, D. (2010): "Ensemble Classification based on generalized additive models". Computational Statistics & Data Analysis, Vol 54, 6, pp. 1535–1546.
Breiman, L. (1996): "Bagging predictors". Machine Learning, Vol 24, 2, pp. 123–140.
Hastie, T. and Tibshirani, R. (1990): "Generalized Additive Models", Chapman and Hall, London.
Ho, T. K. (1998): "The random subspace method for constructing decision forests". IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol 20, 8, pp. 832–844.
See Also
Examples
## Load data: mlbench library should be loaded!)
library(mlbench)
data(Sonar)
SonarSub<-Sonar[,c("V1","V2","V3","V4","V5","V6","Class")]
## Obtain cross-validated classification performance of GAMrsm
## ensembles, using all variables in the Sonar dataset, based on 5-fold
## cross validation runs
Sonar.cv.GAMrsm <- GAMens.cv(Class~s(V1,4)+s(V2,3)+s(V3,4)+V4+V5+V6,
SonarSub ,5, 4 , autoform=FALSE, iter=10, bagging=FALSE, rsm=TRUE )
## Calculate AUCs (for function colAUC, load caTools library)
library(caTools)
GAMrsm.cv.auc <- colAUC(Sonar.cv.GAMrsm[[2]], SonarSub["Class"]=="R",
plotROC=FALSE)