oml_data {mlr3oml}R Documentation

Interface to OpenML Data Sets

Description

This is the class for data sets served on OpenML. This object can also be constructed using the sugar function odt().

mlr3 Integration

Name conversion

Column names that don't comply with R's naming scheme are renamed (see base::make.names()). This means that the names can differ from those on OpenML.

File Format

The datasets stored on OpenML are either stored as (sparse) ARFF or parquet. When creating a new OMLData object, the constructor argument parquet allows to switch between arff and parquet. Note that not necessarily all data files are available as parquet. The option mlr3oml.parquet can be used to set a default. If parquet is TRUE but not available, "arff" will be used as a fallback.

ARFF Files

This package comes with an own reader for ARFF files, based on data.table::fread(). For sparse ARFF files and if the RWeka package is installed, the reader automatically falls back to the implementation in (RWeka::read.arff()).

Parquet Files

For the handling of parquet files, we rely on duckdb and DBI.

Super class

mlr3oml::OMLObject -> OMLData

Active bindings

qualities

(data.table())
Data set qualities (performance values), downloaded from the JSON API response and converted to a data.table::data.table() with columns "name" and "value".

tags

(character())
Returns all tags of the object.

parquet

(logical(1))
Whether to use parquet.

data

(data.table())
Returns the data (without the row identifier and ignore id columns).

features

(data.table())
Information about data set features (including target), downloaded from the JSON API response and converted to a data.table::data.table() with columns:

  • "index" (integer()): Column position.

  • "name" (character()): Name of the feature.

  • "data_type" (factor()): Type of the feature: "nominal" or "numeric".

  • "nominal_value" (list()): Levels of the feature, or NULL for numeric features.

  • "is_target" (logical()): TRUE for target column, FALSE otherwise.

  • "is_ignore" (logical()): TRUE if this feature should be ignored. Ignored features are removed automatically from the data set.

  • "is_row_identifier" (logical()): TRUE if the column encodes a row identifier. Row identifiers are removed automatically from the data set.

  • "number_of_missing_values" (integer()): Number of missing values in the column.

target_names

(character())
Name of the default target, as extracted from the OpenML data set description.

feature_names

(character())
Name of the features, as extracted from the OpenML data set description.

nrow

(integer())
Number of observations, as extracted from the OpenML data set qualities.

ncol

(integer())
Number of features (including targets), as extracted from the table of data set features. This excludes row identifiers and ignored columns.

license

(character())
Returns all license of the dataset.

parquet_path

(character())
Downloads the parquet file (or loads from cache) and returns the path of the parquet file. Note that this also normalizes the names of the parquet file.

Methods

Public methods

Inherited methods

Method new()

Creates a new instance of this R6 class.

Usage
OMLData$new(
  id,
  parquet = parquet_default(),
  test_server = test_server_default()
)
Arguments
id

(integer(1))
OpenML id for the object.

parquet

(logical(1))
Whether to use parquet instead of arff. If parquet is not available, it will fall back to arff. Defaults to value of option "mlr3oml.parquet" or FALSE if not set.

test_server

(character(1))
Whether to use the OpenML test server or public server. Defaults to value of option "mlr3oml.test_server", or FALSE if not set.


Method print()

Prints the object. For a more detailed printer, convert to a mlr3::Task via as_task().

Usage
OMLData$print()

Method download()

Downloads the whole object for offline usage.

Usage
OMLData$download()

Method quality()

Returns the value of a single OpenML data set quality.

Usage
OMLData$quality(name)
Arguments
name

(character(1))
Name of the quality to extract.


Method clone()

The objects of this class are cloneable with this method.

Usage
OMLData$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

References

Vanschoren J, van Rijn JN, Bischl B, Torgo L (2014). “OpenML.” ACM SIGKDD Explorations Newsletter, 15(2), 49–60. doi:10.1145/2641190.2641198.

Examples

# For technical reasons, examples cannot be included in this R package.
# Instead, these are some relevant resources:
#
# Large-Scale Benchmarking chapter in the mlr3book:
# https://mlr3book.mlr-org.com/chapters/chapter11/large-scale_benchmarking.html
#
# Package Article:
# https://mlr3oml.mlr-org.com/articles/tutorial.html

[Package mlr3oml version 0.10.0 Index]