R: Performance functions based on Youden's J statistic

perff {ordinalForest}

R Documentation

Performance functions based on Youden's J statistic

Description

In ordfor so-called performance functions are used to measure the performance of the smaller regression forests constructed prior to the approximation of the optimal score set. Except for one, which uses the ranked probability score (enabling class probability estimation), all of these performance functions are based on Youden's J statistic. These functions may, however, also be used to measure the precision of predictions on new data or the precision of OOB predictions. Note that the performance function using the ranked probability score is not covered in this help page. The function rps from the package verification (version 1.42) can be used to calculate the ranked probability score.

Usage

perff_equal(ytest, ytestpred, categ, classweights)

perff_proportional(ytest, ytestpred, categ, classweights)

perff_oneclass(ytest, ytestpred, categ, classweights)

perff_custom(ytest, ytestpred, categ, classweights)

Arguments

`ytest`	factor. True values of the target variable.
`ytestpred`	factor. Predicted values of the target variable.
`categ`	character. Needed in the case of `perff_oneclass`: Class to prioiritize.
`classweights`	numeric. Needed in the case of `perff_custom`: Vector of length equal to the number of classes. Class weights - classes with higher weights are to be prioiritzed over those with smaller weights.

Details

perff_equal should be used if it is of interest to classify observations from each class with the same accuracy independent of the class sizes. Youden's J statistic is calculated with respect to each class ("observation/prediction in class j" vs. "observation/prediction NOT in class j" (j=1,...,J)) and the simple average of the J results taken.

perff_proportional should be used if the main goal is to classify correctly as many observations as possible. The latter is associated with a preference for larger classes at the expense of a lower classification accuracy with respect to smaller classes. Youden's J statistic is calculated with respect to each class and subsequently a weighted average of these values is taken - with weights proportional to the number of observations representing the respective classes in the training data.

perff_oneclass should be used if it is merely relevant that observations in class categ can be distinguished as reliably as possible from observations not in class categ. Class categ must be passed to perff_oneclass via the argument categ. Youden's J statistic is calculated with respect to class categ.

perff_custom should be used if there is a particular ranking of the classes with respect to their importance. Youden's J statistic is calculated with respect to each class. Subsequently, a weighted average with user-specified weights (provided via the argument classweights) is taken. In this way, classes with higher weights are prioritized by the OF algorithm over classes with smaller weights.

References

Hornung R. (2020) Ordinal Forests. Journal of Classification 37, 4–17. <doi: 10.1007/s00357-018-9302-x>.

Examples

## Not run: 
data(hearth)

set.seed(123)
trainind <- sort(sample(1:nrow(hearth), size=floor(nrow(hearth)*(1/2))))
testind <- sort(sample(setdiff(1:nrow(hearth), trainind), size=20))

datatrain <- hearth[trainind,]
datatest <- hearth[testind,]

ordforres <- ordfor(depvar="Class", data=datatrain, nsets=50, nbest=5, ntreeperdiv=100, 
  ntreefinal=1000)
# NOTE: nsets=50 is not enough, because the prediction performance of the resulting 
# ordinal forest will be suboptimal!! In practice, nsets=1000 (default value) or a larger
# number should be used.

preds <- predict(ordforres, newdata=datatest)

table('true'=datatest$Class, 'predicted'=preds$ypred)


perff_equal(ytest=datatest$Class, ytestpred=preds$ypred)
 
perff_proportional(ytest=datatest$Class, ytestpred=preds$ypred)
 
perff_oneclass(ytest=datatest$Class, ytestpred=preds$ypred, categ="1")
perff_oneclass(ytest=datatest$Class, ytestpred=preds$ypred, categ="2")
perff_oneclass(ytest=datatest$Class, ytestpred=preds$ypred, categ="3")
perff_oneclass(ytest=datatest$Class, ytestpred=preds$ypred, categ="4")
perff_oneclass(ytest=datatest$Class, ytestpred=preds$ypred, categ="5")


perff_custom(ytest=datatest$Class, ytestpred=preds$ypred, classweights=c(1,2,1,1,1))


# perff_equal, perff_proportional, and perff_oneclass are special cases of perff_custom:

perff_custom(ytest=datatest$Class, ytestpred=preds$ypred, classweights=c(1,1,1,1,1))
perff_equal(ytest=datatest$Class, ytestpred=preds$ypred)

perff_custom(ytest=datatest$Class, ytestpred=preds$ypred, classweights=table(datatest$Class))
perff_proportional(ytest=datatest$Class, ytestpred=preds$ypred)

perff_custom(ytest=datatest$Class, ytestpred=preds$ypred, classweights=c(0,0,0,1,0))
perff_oneclass(ytest=datatest$Class, ytestpred=preds$ypred, categ="4")

## End(Not run)

[Package ordinalForest version 2.4-3 Index]