R: Baseline reference for multilabel classification

baseline {utiml}

R Documentation

Baseline reference for multilabel classification

Description

Create a baseline model for multilabel classification.

Usage

baseline(
  mdata,
  metric = c("general", "F1", "hamming-loss", "subset-accuracy", "ranking-loss"),
  ...
)

Arguments

mdata

A mldr dataset used to train the binary models.

metric

Define the strategy used to predict the labels.

The possible values are: 'general', 'F1', 'hamming-loss' or 'subset-accuracy'. See the description for more details. (Default: 'general').

...

not used

Details

Baseline is a naive multi-label classifier that maximize/minimize a specific measure without induces a learning model. It uses the general information about the labels in training dataset to estimate the labels in a test dataset.

The follow strategies are available:

general: Predict the k most frequent labels, where k is the integer most close of label cardinality.
F1: Predict the most frequent labels that obtain the best F1 measure in training data. In the original paper, the authors use the less frequent labels.
hamming-loss: Predict the labels that are associated with more than 50% of instances.
subset-accuracy: Predict the most common labelset.
ranking-loss: Predict a ranking based on the most frequent labels.

Value

An object of class BASELINEmodel containing the set of fitted models, including:

labels: A vector with the label names.
predict: A list with the labels that will be predicted.

References

Metz, J., Abreu, L. F. de, Cherman, E. A., & Monard, M. C. (2012). On the Estimation of Predictive Evaluation Measure Baselines for Multi-label Learning. In 13th Ibero-American Conference on AI (pp. 189-198). Cartagena de Indias, Colombia.

Examples

model <- baseline(toyml)
pred <- predict(model, toyml)

## Change the metric
model <- baseline(toyml, "F1")
model <- baseline(toyml, "subset-accuracy")

[Package utiml version 0.1.7 Index]