input {llama} | R Documentation |
Read data
Description
Reads performance data that can be used to train and evaluate models.
Usage
input(features, performances, algorithmFeatures = NULL, successes = NULL, costs = NULL,
extra = NULL, minimize = T, perfcol = "performance")
Arguments
features |
data frame that contains the feature values for each problem instance and a non-empty set of ID columns. |
algorithmFeatures |
data frame that contains the feature values for each algorithm and a non-empty set of algorithm ID columns. Optional. |
performances |
data frame that contains the performance values for each problem instance and a non-empty set of ID columns. |
successes |
data frame that contains the success values ( |
costs |
either a single number, a data frame or a list that specifies
the cost of the features. If a number is specified, it is assumed to denote
the cost for all problem instances (i.e. the cost is always the same). If a
data frame is given, it is assumed to have one column for each feature with
the same name as the feature where each value gives the cost and a non-empty
set of ID columns. If a list is specified, it is assumed to have a member
|
extra |
data frame containing any extra information about the instances and a non-empty set of ID columns. This is not used in modelling, but can be used e.g. for visualisation. Optional. |
minimize |
whether the minimum performance value is best. Default true. |
perfcol |
name of the column that stores performance values when algorithm features are provided. Default performance. |
Details
input
takes a list of data frames and processes them as follows. The
feature and performance data are joined by looking for common column names in
the two data frames (usually an ID of the problem instance). For each problem,
the best algorithm according to the given performance data is computed. If more
than one algorithm has the best performance, all of them are returned.
The data frame for algorithmic features is optional. When it is provided, the existing data is joined by algorithm names. The final data frame is reshaped into 'long' format.
The data frame that describes whether an algorithm was successful on a problem
is optional. If parscores
or successes
are to be used to
evaluate the learned models, this argument is required however and will lead to
error messages if not supplied.
Similarly, feature costs are optional.
If successes
is given, it is used to determine the best algorithm on each
problem instance. That is, an algorithm can only be best if it was successful.
If no algorithm was successful, the value will be NA
. Special care should
be taken when preparing the performance values for unsuccessful algorithms. For
example, if the performance measure is runtime and success is determined by
whether the algorithm was able to find a solution within a timeout, the
performance value for unsuccessful algorithms should be the timeout value. If
the algorithm failed because of some other reason in a short amount of time,
specifying this small amount of time may confuse some of the algorithm selection
model learners.
Value
data |
the combined data (features, performance, successes). |
best |
a list of the best algorithms. |
ids |
a list of names denoting the instance ID columns. |
features |
a list of names denoting problem features. |
algorithmFeatures |
a list of names denoting algorithm features. 'NULL' if no algorithm features are provided. |
algorithmNames |
a list of algorithm names. 'NULL' if no algorithm features are provided. See 'performance' field in that case. |
algos |
a column that stores names of algorithms. 'NULL' if no algorithm features are provided. |
performance |
a list of names denoting algorithm performances. If algorithm features are provided, a column name that stores algorithm performances. |
success |
a list of names denoting algorithm successes. If algorithm features are provided, a column name that stores algorithm successes. |
minimize |
true if the smaller performance values are better, else false. |
cost |
a list of names denoting feature costs. |
costGroups |
a list of list of names denoting which features belong to which group. Only returned if cost groups are given as input. |
Author(s)
Lars Kotthoff
Examples
# features.csv looks something like
# ID,width,height
# 0,1.2,3
# ...
# performance.csv:
# ID,alg1,alg2
# 0,2,5
# ...
# success.csv:
# ID,alg1,alg2
# 0,T,F
# ...
#input(read.csv("features.csv"), read.csv("performance.csv"),
# read.csv("success.csv"), costs=10)
# costs.csv:
# ID,width,height
# 0,3,4.5
# ...
#input(read.csv("features.csv"), read.csv("performance.csv"),
# read.csv("success.csv"), costs=read.csv("costs.csv"))
# costGroups.csv:
# ID,group1,group2
# 0,3,4.5
# ...
#input(read.csv("features.csv"), read.csv("performance.csv"),
# read.csv("success.csv"),
# costs=list(groups=list(group1=c("height"), group2=c("width")),
# values=read.csv("costGroups.csv")))