R: Ensemble of Binary Relevance for multi-label Classification

ebr {utiml}

R Documentation

Ensemble of Binary Relevance for multi-label Classification

Description

Create an Ensemble of Binary Relevance model for multilabel classification.

Usage

ebr(
  mdata,
  base.algorithm = getOption("utiml.base.algorithm", "SVM"),
  m = 10,
  subsample = 0.75,
  attr.space = 0.5,
  replacement = TRUE,
  ...,
  cores = getOption("utiml.cores", 1),
  seed = getOption("utiml.seed", NA)
)

Arguments

`mdata`	A mldr dataset used to train the binary models.
`base.algorithm`	A string with the name of the base algorithm. (Default: `options("utiml.base.algorithm", "SVM")`)
`m`	The number of Binary Relevance models used in the ensemble. (Default: 10)
`subsample`	A value between 0.1 and 1 to determine the percentage of training instances that must be used for each classifier. (Default: 0.75)
`attr.space`	A value between 0.1 and 1 to determine the percentage of attributes that must be used for each classifier. (Default: 0.50)
`replacement`	Boolean value to define if use sampling with replacement to create the data of the models of the ensemble. (Default: TRUE)
`...`	Others arguments passed to the base algorithm for all subproblems.
`cores`	The number of cores to parallelize the training. Values higher than 1 require the parallel package. (Default: `options("utiml.cores", 1)`)
`seed`	An optional integer used to set the seed. This is useful when the method is run in parallel. (Default: `options("utiml.seed", NA)`)

Details

This model is composed by a set of Binary Relevance models. Binary Relevance is a simple and effective transformation method to predict multi-label data.

Value

An object of class EBRmodel containing the set of fitted BR models, including:

models: A list of BR models.
nrow: The number of instances used in each training dataset.
ncol: The number of attributes used in each training dataset.
rounds: The number of interactions.

Note

If you want to reproduce the same classification and obtain the same result will be necessary set a flag utiml.mc.set.seed to FALSE.

References

Read, J., Pfahringer, B., Holmes, G., & Frank, E. (2011). Classifier chains for multi-label classification. Machine Learning, 85(3), 333-359.

Read, J., Pfahringer, B., Holmes, G., & Frank, E. (2009). Classifier Chains for Multi-label Classification. Machine Learning and Knowledge Discovery in Databases, Lecture Notes in Computer Science, 5782, 254-269.

Examples

model <- ebr(toyml, "RANDOM")
pred <- predict(model, toyml)


# Use C5.0 with 90% of instances and only 5 rounds
model <- ebr(toyml, 'C5.0', m = 5, subsample = 0.9)

# Use 75% of attributes
model <- ebr(toyml, attr.space = 0.75)

# Running in 2 cores and define a specific seed
model1 <- ebr(toyml, cores=2, seed = 312)

[Package utiml version 0.1.7 Index]