PrInDTMulabAll {PrInDT} | R Documentation |
Multiple label classification based on all observations
Description
Multiple label classification based on all observations. We consider two ways of modeling (Binary relevance modeling,
dependent binary modeling) and three ways of model evaluation: single
assessment, joint assessment, and true prediction (see the Value section for more information).
Interpretability is checked (see ctestv).
Variables should be arranged in 'datain' according to indices specified in 'indind', 'indaddind', and 'inddep'.
Reference
Probst, P., Au, Q., Casalicchio, G., Stachl, C., and Bischl, B. 2017. Multilabel Classification with
R Package mlr. arXiv:1703.08991v2
Usage
PrInDTMulabAll(datain, classnames, ctestv=NA, conf.level=0.95, indind, indaddind,
inddep)
Arguments
datain |
Input data frame with class factor variable 'classname' and the |
classnames |
names of class variables (character vector) |
ctestv |
Vector of character strings of forbidden split results; |
conf.level |
(1 - significance level) in function |
indind |
indices of independent variables |
indaddind |
indices of additional predictors used in the case of dependent binary relevance modeling |
inddep |
indices of dependent variables |
Details
Standard output can be produced by means of print(name)
or just name
as well as plot(name)
where 'name' is the output data
frame of the function.
The plot function will produce a series of more than one plot. If you use R, you might want to specify windows(record=TRUE)
before
plot(name)
to save the whole series of plots. In R-Studio this functionality is provided automatically.
Value
- accabr
model errors for Binary Relevance (single assessment) - only independent predictors are used for modeling one label at a time, the other labels are not used as predictors. The classification rules are trained on all observations. As the performance measure for the resulting classification rules, the balanced accuracy of the models for each individual label is employed.
- errabin
combined error for Binary Relevance (joint assessment) - the best prediction models for the different labels are combined to assess the combined prediction. The 01-accuracy counts a label combination as correct only if all labels are correctly predicted. The hamming accuracy corresponds to the proportion of labels whose value is correctly predicted.
- accadbr
model errors in Dependent Binary Relevance (Extended Model) (single assessment) - each label is trained by means of an extended model which not only includes the independent predictors but also the other labels. For these labels the truly observed values are used for estimation and prediction. In the extended model, further labels, which are not treated as dependent variables, can be used as additional predictors.
- erraext
combined errors for Dependent Binary Relevance (Extended Model) (joint assessment)
- erratrue
combined errors for Dependent Binary Relevance (True Prediction) - in the prediction phase, the values of all modeled labels are first predicted by the independent predictors only (see Binary Relevance) and then the predicted labels are used in the estimated extended model in a 2nd step to ultimately predict the labels.
- coldata
column names of input data
- inddep
indices of dependent variables (labels to be modeled)
- treeabr
list of trees from Binary Relevance modeling, one tree for each label; refer to an individual tree as
treeabr[[i]]
, i = 1, ..., no. of labels- treeadbr
list of trees from Dependent Binary Relevance modeling, one for each label; refer to an individual tree as
treeadbr[[i]]
, i = 1, ..., no. of labels
Examples
data <- PrInDT::data_land # load data
dataclean <- data[,c(1:7,23:24,11:13,22,8:10)] # only relevant features
indind <- c(1:9) # original predictors
indaddind <- c(10:13) # additional predictors
inddep <- c(14:16) # dependent variables
dataclean <- na.omit(dataclean)
ctestv <- NA
##
# Call PrInDTAll: language by language
##
outmultAll <- PrInDTMulabAll(dataclean,colnames(dataclean)[inddep],ctestv,conf.level=0.95,
indind,indaddind,inddep)
outmultAll
plot(outmultAll)