R: CSMES scoring: generate predictions for the optimal ensemble...

CSMES.predict {CSMES}

R Documentation

CSMES scoring: generate predictions for the optimal ensemble classifier according to CSMES in function of cost information.

Description

This function generates predictions for a new data set (containing candidate member library predictions) using a CSMES model. Using Pareto-optimal ensemble definitions generated through CSMES.ensSel and the ensemble nomination front generated using CSMES.EnsNomCurve, final ensemble predictions are generated in function of cost information known to the user at the time of model scoring. The model allows for three scenarios: (1) the candidate ensemble is nominated in function of a specific cost ratio, (2) the ensemble is nominated in function of partial AUCC (or a distribution over operating points) and (3) the candidate ensemble that is optimal over the entire cost space in function of area under the cost or brier curve is chosen.

Usage

CSMES.predict(
  ensSelModel,
  ensNomCurve,
  newdata,
  criterion = c("minEMC", "minAUCC", "minPartAUCC"),
  costRatio = 5,
  partAUCC_mu = 0.5,
  partAUCC_sd = 0.1
)

Arguments

`ensSelModel`	ensemble selection model (output of `CSMES.ensSel`)
`ensNomCurve`	ensemble nomination curve object (output of `CSMES.ensNomCurve`)
`newdata`	matrix containing ensemble library member model predictions for new data set
`criterion`	This argument specifies which criterion determines the selection of the ensemble candidate that delivers predictions. Can be one of three options: "minEMC", "minAUCC" or "minPartAUCC".
`costRatio`	Specifies the cost ratio used to determine expected misclassification cost. Only relvant when `criterion` is "minEMC".
`partAUCC_mu`	Desired mean operating condition when `criterion` is "minPartAUCC" (partial area under the cost/brier curve).
`partAUCC_sd`	Desired standard deviation when `criterion` is "minPartAUCC" (partial area under the cost/brier curve).

Value

An list with the following components:

`pred`	A matrix with model predictions. Both class and probability predictions are delivered.
`criterion`	The criterion specified to determine the selection of the ensemble candidate.
`costRatio`	The cost ratio in function of which the `criterion` "minEMC" has selected the optimal candidate ensemble that delivered predictions

Author(s)

Koen W. De Bock, kdebock@audencia.com

References

De Bock, K.W., Lessmann, S. And Coussement, K., Cost-sensitive business failure prediction when misclassification costs are uncertain: A heterogeneous ensemble selection approach, European Journal of Operational Research (2020), doi: 10.1016/j.ejor.2020.01.052.

Examples

##load data
library(rpart)
library(zoo)
library(ROCR)
library(mco)
data(BFP)
##generate random order vector
BFP_r<-BFP[sample(nrow(BFP),nrow(BFP)),]
size<-nrow(BFP_r)
##size<-300
train<-BFP_r[1:floor(size/3),]
val<-BFP_r[ceiling(size/3):floor(2*size/3),]
test<-BFP_r[ceiling(2*size/3):size,]
##generate a list containing model specifications for 100 CART decisions trees varying in the cp
##and minsplit parameters, and trained on bootstrap samples (bagging)
rpartSpecs<-list()
for (i in 1:100){
  data<-train[sample(1:ncol(train),size=ncol(train),replace=TRUE),]
  str<-paste("rpartSpecs$rpart",i,"=rpart(as.formula(Class~.),data,method=\"class\",
  control=rpart.control(minsplit=",round(runif(1, min = 1, max = 20)),",cp=",runif(1,
  min = 0.05, max = 0.4),"))",sep="")
  eval(parse(text=str))
}
##generate predictions for these models
hillclimb<-mat.or.vec(nrow(val),100)
for (i in 1:100){
  str<-paste("hillclimb[,",i,"]=predict(rpartSpecs[[i]],newdata=val)[,2]",sep="")
  eval(parse(text=str))
}
##score the validation set used for ensemble selection, to be used for ensemble selection
ESmodel<-CSMES.ensSel(hillclimb,val$Class,obj1="FNR",obj2="FPR",selType="selection",
generations=10,popsize=12,plot=TRUE)
## Create Ensemble nomination curve
enc<-CSMES.ensNomCurve(ESmodel,hillclimb,val$Class,curveType="costCurve",method="classPreds",
plot=FALSE)

[Package CSMES version 1.0.1 Index]