R: CSMES Training Stage 2: Extract an ensemble nomination curve...

CSMES.ensNomCurve {CSMES}

R Documentation

CSMES Training Stage 2: Extract an ensemble nomination curve (cost curve- or Brier curve-based) from a set of Pareto-optimal ensemble classifiers

Description

Generates an ensemble nomination curve from a set of Pareto-optimal ensemble definitions as identified through CSMES.ensSel).

Usage

CSMES.ensNomCurve(
  ensSelModel,
  memberPreds,
  y,
  curveType = c("costCurve", "brierSkew", "brierCost"),
  method = c("classPreds", "probPreds"),
  plotting = FALSE,
  nrBootstraps = 1
)

Arguments

`ensSelModel`	ensemble selection model (output of `CSMES.ensSel`)
`memberPreds`	matrix containing ensemble member library predictions
`y`	Vector with true class labels. Currently, a dichotomous outcome variable is supported
`curveType`	the type of cost curve used to construct the ensemble nomination curve. Shoul be "brierCost","brierSkew" or "costCurve" (default).
`method`	how are candidate ensemble learner predictions used to generate the ensemble nomination front? "classPreds" for class predictions (default), "probPreds" for probability predictions.
`plotting`	`TRUE` or `FALSE`: Should a plot be generated showing the Brier curve? Defaults to `FALSE`.
`nrBootstraps`	optionally, the ensemble nomination curve can be generated through bootstrapping. This argument specifies the number of iterations/bootstrap samples. Default is 1.

Value

An object of the class CSMES.ensNomCurve which is a list with the following components:

`nomcurve`	the ensemble nomination curve
`curves`	individual cost curves or brier curves of ensemble members
`intervals`	resolution of the ensemble nomination curve
`incidence`	incidence (positive rate) of the outcome variable
`area_under_curve`	area under the ensemble nomination curve
`method`	method used to generate the ensemble nomination front:"classPreds" for class predictions (default), "probPreds" for probability predictions
`curveType`	the type of cost curve used to construct the ensemble nomination curve
`nrBootstraps`	number of boostrap samples over which the ensemble nomination curve was estimated

Author(s)

Koen W. De Bock, kdebock@audencia.com

References

De Bock, K.W., Lessmann, S. And Coussement, K., Cost-sensitive business failure prediction when misclassification costs are uncertain: A heterogeneous ensemble selection approach, European Journal of Operational Research (2020), doi: 10.1016/j.ejor.2020.01.052.

Examples

##load data
library(rpart)
library(zoo)
library(ROCR)
library(mco)
data(BFP)
##generate random order vector
BFP_r<-BFP[sample(nrow(BFP),nrow(BFP)),]
size<-nrow(BFP_r)
##size<-300
train<-BFP_r[1:floor(size/3),]
val<-BFP_r[ceiling(size/3):floor(2*size/3),]
test<-BFP_r[ceiling(2*size/3):size,]
##generate a list containing model specifications for 100 CART decisions trees varying in the cp
##and minsplit parameters, and trained on bootstrap samples (bagging)
rpartSpecs<-list()
for (i in 1:100){
  data<-train[sample(1:ncol(train),size=ncol(train),replace=TRUE),]
  str<-paste("rpartSpecs$rpart",i,"=rpart(as.formula(Class~.),data,method=\"class\",
  control=rpart.control(minsplit=",round(runif(1, min = 1, max = 20)),",cp=",runif(1,
  min = 0.05, max = 0.4),"))",sep="")
  eval(parse(text=str))
}
##generate predictions for these models
hillclimb<-mat.or.vec(nrow(val),100)
for (i in 1:100){
  str<-paste("hillclimb[,",i,"]=predict(rpartSpecs[[i]],newdata=val)[,2]",sep="")
  eval(parse(text=str))
}
##score the validation set used for ensemble selection, to be used for ensemble selection
ESmodel<-CSMES.ensSel(hillclimb,val$Class,obj1="FNR",obj2="FPR",selType="selection",
generations=10,popsize=12,plot=TRUE)
## Create Ensemble nomination curve
enc<-CSMES.ensNomCurve(ESmodel,hillclimb,val$Class,curveType="costCurve",method="classPreds",
plot=FALSE)

[Package CSMES version 1.0.1 Index]