CSMES.ensSel {CSMES}R Documentation

CSMES Training Stage 1: Cost-Sensitive Multicriteria Ensemble Selection resulting in a Pareto frontier of candidate ensemble classifiers

Description

This function applies the first stage in the learning process of CSMES: optimizing Cost-Sensitive Multicriteria Ensemble Selection, resulting in a Pareto frontier of equivalent candidate ensemble classifiers along two objective functions. By default, cost space is optimized by optimizing false positive and false negative rates simultaneously. This results in a set of optimal ensemble classifiers, varying in the tradeoff between FNR and FPR. Optionally, other objective metrics can be specified. Currently, only binary classification is supported.

Usage

CSMES.ensSel(
  memberPreds,
  y,
  obj1 = c("FNR", "AUCC", "MSE", "AUC"),
  obj2 = c("FPR", "ensSize", "ensSizeSq", "clAmb"),
  selType = c("selection", "selectionWeighted", "weighted"),
  plotting = TRUE,
  generations = 30,
  popsize = 100
)

Arguments

memberPreds

matrix containing ensemble member library predictions

y

Vector with true class labels. Currently, a dichotomous outcome variable is supported

obj1

Specifies the first objective metric to be minimized

obj2

Specifies the second objective metric to be minimized

selType

Specifies the type of ensemble selection to be applied: "selection" for basic selection, "selectionWeighted" for weighted selection, "weighted" for weighted sum

plotting

TRUE or FALSE: Should a plot be generated showing objective function values throughout the optimization process?

generations

the number of population generations for nsga-II. Default is 30.

popsize

the population size for nsga-II. Default is 100.

Value

An object of the class CSMES.ensSel which is a list with the following components:

weights

ensemble member weights for all pareto-optimal ensemble classifiers after multicriteria ensemble selection

obj_values

optimization objective values

pareto

overview of pareto-optimal ensemble classifiers

popsize

the population size for nsga-II

generarations

the number of population generations for nsga-II

obj1

Specifies the first objective metric that was minimized

obj2

Specifies the second objective metric that was minimized

selType

the type of ensemble selection that was applied: "selection", "selectionWeighted" or "weighted"

ParetoPredictions_p

probability predictions for pareto-optimal ensemble classifiers

ParetoPredictions_c

class predictions for pareto-optimal ensebmle classifiers

Author(s)

Koen W. De Bock, kdebock@audencia.com

References

De Bock, K.W., Lessmann, S. And Coussement, K., Cost-sensitive business failure prediction when misclassification costs are uncertain: A heterogeneous ensemble selection approach, European Journal of Operational Research (2020), doi: 10.1016/j.ejor.2020.01.052.

Examples

##load data
library(rpart)
library(zoo)
library(ROCR)
library(mco)
data(BFP)
##generate random order vector
BFP_r<-BFP[sample(nrow(BFP),nrow(BFP)),]
size<-nrow(BFP_r)
##size<-300
train<-BFP_r[1:floor(size/3),]
val<-BFP_r[ceiling(size/3):floor(2*size/3),]
test<-BFP_r[ceiling(2*size/3):size,]
##generate a list containing model specifications for 100 CART decisions trees varying in the cp
##and minsplit parameters, and trained on bootstrap samples (bagging)
rpartSpecs<-list()
for (i in 1:100){
  data<-train[sample(1:ncol(train),size=ncol(train),replace=TRUE),]
  str<-paste("rpartSpecs$rpart",i,"=rpart(as.formula(Class~.),data,method=\"class\",
  control=rpart.control(minsplit=",round(runif(1, min = 1, max = 20)),",cp=",runif(1,
  min = 0.05, max = 0.4),"))",sep="")
  eval(parse(text=str))
}
##generate predictions for these models
hillclimb<-mat.or.vec(nrow(val),100)
for (i in 1:100){
  str<-paste("hillclimb[,",i,"]=predict(rpartSpecs[[i]],newdata=val)[,2]",sep="")
  eval(parse(text=str))
}
##score the validation set used for ensemble selection, to be used for ensemble selection
ESmodel<-CSMES.ensSel(hillclimb,val$Class,obj1="FNR",obj2="FPR",selType="selection",
generations=10,popsize=12,plot=TRUE)
## Create Ensemble nomination curve
enc<-CSMES.ensNomCurve(ESmodel,hillclimb,val$Class,curveType="costCurve",method="classPreds",
plot=FALSE)

[Package CSMES version 1.0.1 Index]