CSMES.predict {CSMES} | R Documentation |
CSMES scoring: generate predictions for the optimal ensemble classifier according to CSMES in function of cost information.
Description
This function generates predictions for a new data set (containing candidate member library predictions) using a CSMES model. Using Pareto-optimal ensemble definitions
generated through CSMES.ensSel
and the ensemble nomination front generated using CSMES.EnsNomCurve
, final ensemble predictions are generated in function of
cost information known to the user at the time of model scoring. The model allows for three scenarios: (1) the candidate ensemble is nominated in function of a specific cost
ratio, (2) the ensemble is nominated in function of partial AUCC (or a distribution over operating points) and (3) the candidate ensemble that is
optimal over the entire cost space in function of area under the cost or brier curve is chosen.
Usage
CSMES.predict(
ensSelModel,
ensNomCurve,
newdata,
criterion = c("minEMC", "minAUCC", "minPartAUCC"),
costRatio = 5,
partAUCC_mu = 0.5,
partAUCC_sd = 0.1
)
Arguments
ensSelModel |
ensemble selection model (output of |
ensNomCurve |
ensemble nomination curve object (output of |
newdata |
matrix containing ensemble library member model predictions for new data set |
criterion |
This argument specifies which criterion determines the selection of the ensemble candidate that delivers predictions. Can be one of three options: "minEMC", "minAUCC" or "minPartAUCC". |
costRatio |
Specifies the cost ratio used to determine expected misclassification cost. Only relvant when |
partAUCC_mu |
Desired mean operating condition when |
partAUCC_sd |
Desired standard deviation when |
Value
An list with the following components:
pred |
A matrix with model predictions. Both class and probability predictions are delivered. |
criterion |
The criterion specified to determine the selection of the ensemble candidate. |
costRatio |
The cost ratio in function of which the |
Author(s)
Koen W. De Bock, kdebock@audencia.com
References
De Bock, K.W., Lessmann, S. And Coussement, K., Cost-sensitive business failure prediction when misclassification costs are uncertain: A heterogeneous ensemble selection approach, European Journal of Operational Research (2020), doi: 10.1016/j.ejor.2020.01.052.
See Also
CSMES.ensSel
, CSMES.predictPareto
, CSMES.ensNomCurve
Examples
##load data
library(rpart)
library(zoo)
library(ROCR)
library(mco)
data(BFP)
##generate random order vector
BFP_r<-BFP[sample(nrow(BFP),nrow(BFP)),]
size<-nrow(BFP_r)
##size<-300
train<-BFP_r[1:floor(size/3),]
val<-BFP_r[ceiling(size/3):floor(2*size/3),]
test<-BFP_r[ceiling(2*size/3):size,]
##generate a list containing model specifications for 100 CART decisions trees varying in the cp
##and minsplit parameters, and trained on bootstrap samples (bagging)
rpartSpecs<-list()
for (i in 1:100){
data<-train[sample(1:ncol(train),size=ncol(train),replace=TRUE),]
str<-paste("rpartSpecs$rpart",i,"=rpart(as.formula(Class~.),data,method=\"class\",
control=rpart.control(minsplit=",round(runif(1, min = 1, max = 20)),",cp=",runif(1,
min = 0.05, max = 0.4),"))",sep="")
eval(parse(text=str))
}
##generate predictions for these models
hillclimb<-mat.or.vec(nrow(val),100)
for (i in 1:100){
str<-paste("hillclimb[,",i,"]=predict(rpartSpecs[[i]],newdata=val)[,2]",sep="")
eval(parse(text=str))
}
##score the validation set used for ensemble selection, to be used for ensemble selection
ESmodel<-CSMES.ensSel(hillclimb,val$Class,obj1="FNR",obj2="FPR",selType="selection",
generations=10,popsize=12,plot=TRUE)
## Create Ensemble nomination curve
enc<-CSMES.ensNomCurve(ESmodel,hillclimb,val$Class,curveType="costCurve",method="classPreds",
plot=FALSE)