makeMultilabelTask {mlr}R Documentation

Create a multilabel task.

Description

Create a multilabel task.

Usage

makeMultilabelTask(
  id = deparse(substitute(data)),
  data,
  target,
  weights = NULL,
  blocking = NULL,
  coordinates = NULL,
  fixup.data = "warn",
  check.data = TRUE
)

Arguments

id

(character(1))
Id string for object. Default is the name of the R variable passed to data.

data

(data.frame)
A data frame containing the features and target variable(s).

target

(character(1) | character(2) | character(n.classes))
Name(s) of the target variable(s). For survival analysis these are the names of the survival time and event columns, so it has length 2. For multilabel classification it contains the names of the logical columns that encode whether a label is present or not and its length corresponds to the number of classes.

weights

(numeric)
Optional, non-negative case weight vector to be used during fitting. Cannot be set for cost-sensitive learning. Default is NULL which means no (= equal) weights.

blocking

(factor)
An optional factor of the same length as the number of observations. Observations with the same blocking level “belong together”. Specifically, they are either put all in the training or the test set during a resampling iteration. Default is NULL which means no blocking.

coordinates

(data.frame)
Coordinates of a spatial data set that will be used for spatial partitioning of the data in a spatial cross-validation resampling setting. Coordinates have to be numeric values. Provided data.frame needs to have the same number of rows as data and consist of at least two dimensions.

fixup.data

(character(1))
Should some basic cleaning up of data be performed? Currently this means removing empty factor levels for the columns. Possible choices are: “no” = Don't do it. “warn” = Do it but warn about it. “quiet” = Do it but keep silent. Default is “warn”.

check.data

(logical(1))
Should sanity of data be checked initially at task creation? You should have good reasons to turn this off (one might be speed). Default is TRUE.

Details

For multilabel classification we assume that the presence of labels is encoded via logical columns in data. The name of the column specifies the name of the label. target is then a char vector that points to these columns.

Note

For multilabel classification we assume that the presence of labels is encoded via logical columns in data. The name of the column specifies the name of the label. target is then a char vector that points to these columns.

See Also

Task ClassifTask ClusterTask CostSensTask RegrTask SurvTask


[Package mlr version 2.19.2 Index]