OMLDataSet {OpenML} | R Documentation |
OMLDataSet.
Description
An OMLDataSet
consists of an OMLDataSetDescription
, a
data.frame
containing the data set, the old and new column names and,
finally, the target features.
The OMLDataSetDescription
provides information on the data set,
like the ID, name, version, etc. To see a full list of all elements, please see the
documentation.
The slot colnames.old
contains the original names, i.e., the column names that were
uploaded to the server, while colnames.new
contains the names that you will see when
working with the data in R.
Most of the time, old and new column names are identical. Only if the original names are
not valid, the new ones will differ.
The slot target.features
contains the column name(s) from the data.frame
of the OMLDataSet
that refer to the target feature(s).
Usage
makeOMLDataSet(
desc,
data,
colnames.old = colnames(data),
colnames.new = colnames(data),
target.features = NULL
)
Arguments
desc |
[ |
data |
[ |
colnames.old |
[ |
colnames.new |
[ |
target.features |
[ |
Value
[OMLDataSet
]
See Also
Other data set-related functions:
OMLDataSetDescription
,
convertMlrTaskToOMLDataSet()
,
convertOMLDataSetToMlr()
,
deleteOMLObject()
,
getOMLDataSet()
,
listOMLDataSets()
,
tagOMLObject()
,
uploadOMLDataSet()
Examples
data("airquality")
dsc = "Daily air quality measurements in New York, May to September 1973.
This data is taken from R."
cit = "Chambers, J. M., Cleveland, W. S., Kleiner, B. and Tukey, P. A. (1983) Graphical
Methods for Data Analysis. Belmont, CA: Wadsworth."
desc_airquality = makeOMLDataSetDescription(name = "airquality",
description = dsc,
creator = "New York State Department of Conservation (ozone data) and the National
Weather Service (meteorological data)",
collection.date = "May 1, 1973 to September 30, 1973",
language = "English",
licence = "GPL-2",
url = "https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html",
default.target.attribute = "Ozone",
citation = cit,
tags = "R")
airquality_oml = makeOMLDataSet(desc = desc_airquality,
data = airquality,
colnames.old = colnames(airquality),
colnames.new = colnames(airquality),
target.features = "Ozone")