CSMES.predict {CSMES}R Documentation

CSMES scoring: generate predictions for the optimal ensemble classifier according to CSMES in function of cost information.

Description

This function generates predictions for a new data set (containing candidate member library predictions) using a CSMES model. Using Pareto-optimal ensemble definitions generated through CSMES.ensSel and the ensemble nomination front generated using CSMES.EnsNomCurve, final ensemble predictions are generated in function of cost information known to the user at the time of model scoring. The model allows for three scenarios: (1) the candidate ensemble is nominated in function of a specific cost ratio, (2) the ensemble is nominated in function of partial AUCC (or a distribution over operating points) and (3) the candidate ensemble that is optimal over the entire cost space in function of area under the cost or brier curve is chosen.

Usage

CSMES.predict(
  ensSelModel,
  ensNomCurve,
  newdata,
  criterion = c("minEMC", "minAUCC", "minPartAUCC"),
  costRatio = 5,
  partAUCC_mu = 0.5,
  partAUCC_sd = 0.1
)

Arguments

ensSelModel

ensemble selection model (output of CSMES.ensSel)

ensNomCurve

ensemble nomination curve object (output of CSMES.ensNomCurve)

newdata

matrix containing ensemble library member model predictions for new data set

criterion

This argument specifies which criterion determines the selection of the ensemble candidate that delivers predictions. Can be one of three options: "minEMC", "minAUCC" or "minPartAUCC".

costRatio

Specifies the cost ratio used to determine expected misclassification cost. Only relvant when criterion is "minEMC".

partAUCC_mu

Desired mean operating condition when criterion is "minPartAUCC" (partial area under the cost/brier curve).

partAUCC_sd

Desired standard deviation when criterion is "minPartAUCC" (partial area under the cost/brier curve).

Value

An list with the following components:

pred

A matrix with model predictions. Both class and probability predictions are delivered.

criterion

The criterion specified to determine the selection of the ensemble candidate.

costRatio

The cost ratio in function of which the criterion "minEMC" has selected the optimal candidate ensemble that delivered predictions

Author(s)

Koen W. De Bock, kdebock@audencia.com

References

De Bock, K.W., Lessmann, S. And Coussement, K., Cost-sensitive business failure prediction when misclassification costs are uncertain: A heterogeneous ensemble selection approach, European Journal of Operational Research (2020), doi: 10.1016/j.ejor.2020.01.052.

See Also

CSMES.ensSel, CSMES.predictPareto, CSMES.ensNomCurve

Examples

##load data
library(rpart)
library(zoo)
library(ROCR)
library(mco)
data(BFP)
##generate random order vector
BFP_r<-BFP[sample(nrow(BFP),nrow(BFP)),]
size<-nrow(BFP_r)
##size<-300
train<-BFP_r[1:floor(size/3),]
val<-BFP_r[ceiling(size/3):floor(2*size/3),]
test<-BFP_r[ceiling(2*size/3):size,]
##generate a list containing model specifications for 100 CART decisions trees varying in the cp
##and minsplit parameters, and trained on bootstrap samples (bagging)
rpartSpecs<-list()
for (i in 1:100){
  data<-train[sample(1:ncol(train),size=ncol(train),replace=TRUE),]
  str<-paste("rpartSpecs$rpart",i,"=rpart(as.formula(Class~.),data,method=\"class\",
  control=rpart.control(minsplit=",round(runif(1, min = 1, max = 20)),",cp=",runif(1,
  min = 0.05, max = 0.4),"))",sep="")
  eval(parse(text=str))
}
##generate predictions for these models
hillclimb<-mat.or.vec(nrow(val),100)
for (i in 1:100){
  str<-paste("hillclimb[,",i,"]=predict(rpartSpecs[[i]],newdata=val)[,2]",sep="")
  eval(parse(text=str))
}
##score the validation set used for ensemble selection, to be used for ensemble selection
ESmodel<-CSMES.ensSel(hillclimb,val$Class,obj1="FNR",obj2="FPR",selType="selection",
generations=10,popsize=12,plot=TRUE)
## Create Ensemble nomination curve
enc<-CSMES.ensNomCurve(ESmodel,hillclimb,val$Class,curveType="costCurve",method="classPreds",
plot=FALSE)

[Package CSMES version 1.0.1 Index]