CSMES.ensNomCurve {CSMES}R Documentation

CSMES Training Stage 2: Extract an ensemble nomination curve (cost curve- or Brier curve-based) from a set of Pareto-optimal ensemble classifiers

Description

Generates an ensemble nomination curve from a set of Pareto-optimal ensemble definitions as identified through CSMES.ensSel).

Usage

CSMES.ensNomCurve(
  ensSelModel,
  memberPreds,
  y,
  curveType = c("costCurve", "brierSkew", "brierCost"),
  method = c("classPreds", "probPreds"),
  plotting = FALSE,
  nrBootstraps = 1
)

Arguments

ensSelModel

ensemble selection model (output of CSMES.ensSel)

memberPreds

matrix containing ensemble member library predictions

y

Vector with true class labels. Currently, a dichotomous outcome variable is supported

curveType

the type of cost curve used to construct the ensemble nomination curve. Shoul be "brierCost","brierSkew" or "costCurve" (default).

method

how are candidate ensemble learner predictions used to generate the ensemble nomination front? "classPreds" for class predictions (default), "probPreds" for probability predictions.

plotting

TRUE or FALSE: Should a plot be generated showing the Brier curve? Defaults to FALSE.

nrBootstraps

optionally, the ensemble nomination curve can be generated through bootstrapping. This argument specifies the number of iterations/bootstrap samples. Default is 1.

Value

An object of the class CSMES.ensNomCurve which is a list with the following components:

nomcurve

the ensemble nomination curve

curves

individual cost curves or brier curves of ensemble members

intervals

resolution of the ensemble nomination curve

incidence

incidence (positive rate) of the outcome variable

area_under_curve

area under the ensemble nomination curve

method

method used to generate the ensemble nomination front:"classPreds" for class predictions (default), "probPreds" for probability predictions

curveType

the type of cost curve used to construct the ensemble nomination curve

nrBootstraps

number of boostrap samples over which the ensemble nomination curve was estimated

Author(s)

Koen W. De Bock, kdebock@audencia.com

References

De Bock, K.W., Lessmann, S. And Coussement, K., Cost-sensitive business failure prediction when misclassification costs are uncertain: A heterogeneous ensemble selection approach, European Journal of Operational Research (2020), doi: 10.1016/j.ejor.2020.01.052.

See Also

CSMES.ensSel, CSMES.predictPareto, CSMES.predict

Examples

##load data
library(rpart)
library(zoo)
library(ROCR)
library(mco)
data(BFP)
##generate random order vector
BFP_r<-BFP[sample(nrow(BFP),nrow(BFP)),]
size<-nrow(BFP_r)
##size<-300
train<-BFP_r[1:floor(size/3),]
val<-BFP_r[ceiling(size/3):floor(2*size/3),]
test<-BFP_r[ceiling(2*size/3):size,]
##generate a list containing model specifications for 100 CART decisions trees varying in the cp
##and minsplit parameters, and trained on bootstrap samples (bagging)
rpartSpecs<-list()
for (i in 1:100){
  data<-train[sample(1:ncol(train),size=ncol(train),replace=TRUE),]
  str<-paste("rpartSpecs$rpart",i,"=rpart(as.formula(Class~.),data,method=\"class\",
  control=rpart.control(minsplit=",round(runif(1, min = 1, max = 20)),",cp=",runif(1,
  min = 0.05, max = 0.4),"))",sep="")
  eval(parse(text=str))
}
##generate predictions for these models
hillclimb<-mat.or.vec(nrow(val),100)
for (i in 1:100){
  str<-paste("hillclimb[,",i,"]=predict(rpartSpecs[[i]],newdata=val)[,2]",sep="")
  eval(parse(text=str))
}
##score the validation set used for ensemble selection, to be used for ensemble selection
ESmodel<-CSMES.ensSel(hillclimb,val$Class,obj1="FNR",obj2="FPR",selType="selection",
generations=10,popsize=12,plot=TRUE)
## Create Ensemble nomination curve
enc<-CSMES.ensNomCurve(ESmodel,hillclimb,val$Class,curveType="costCurve",method="classPreds",
plot=FALSE)

[Package CSMES version 1.0.1 Index]