R: Parses the data files of an algorithm selection scenario into...

parseASScenario {aslib}

R Documentation

Parses the data files of an algorithm selection scenario into an S3 object.

Description

Object members

Let n be the number of (replicated) instances, m the number of unique instances, p the number of features, s the number of feature steps and k the number of algorithms.

desc [ASScenarioDesc]: Description object, containing further info.
feature.runstatus [data.frame(n, s + 2)]: Runstatus of instance feature computation steps. The first 2 columns are “instance_id” and “repetition”, the remaining are the status factors. The step columns are in the same order as the feature steps in the description object. The factor levels are always: ok, presolved, crash, timeout, memout, other. No entry can be NA. The data.frame is sorted by “instance_id”, then “repetition”.
algorithm.feature.runstatus [data.frame(k, s + 1)]: Runstatus of algorithm feature computation steps. The first column is “algorithm”, the remaining are the status factors. The step columns are in the same order as the feature steps in the description object. The factor levels are always: ok, crash, timeout, memout, other. No entry can be NA. The data.frame is sorted by “algorithm”.
feature.costs [data.frame(n, s + 2)]: Costs of instance feature computation steps. The first 2 columns are “instance_id” and “repetition”, the remaining are numeric costs of the instance feature steps. The step columns are in the same order as the feature steps in the description object. codeNA means the cost is not available, possibly because the feature computation was aborted. The data.frame is sorted by “instance_id”, then “repetition”. If no cost file is available at all, NULL is stored.
algorithm.feature.costs [data.frame(n, s + 1)]: Costs of algorithm feature computation steps. The first column is “algorithm”, the remaining are numeric costs of the algorithmic feature steps. The step columns are in the same order as the feature steps in the description object. codeNA means the cost is not available, possibly because the feature computation was aborted. The data.frame is sorted by “algorithm”. If no cost file is available at all, NULL is stored.
feature.values [data.frame(n, p + 2)]: Measured feature values of instances. The first 2 columns are “instance_id” and “repetition”. The remaining ones are the measured instance features. The feature columns are in the same order as “instance_features_deterministic”, “features_stochastic” in the description object. codeNA means the feature is not available, possibly because the feature computation was aborted. The data.frame is sorted by “instance_id”, then “repetition”.
algorithm.feature.values [data.frame(k, p + 1)]: Measured feature values of algorithms The first column is “algorithm”. The remaining ones are the measured algorithmic features. The feature columns are in the same order as “algorithm_features_deterministic”, “algorithm_features_stochastic” in the description object. codeNA means the feature is not available, possibly because the feature computation was aborted. The data.frame is sorted by “algorithm”.
algo.runs [data.frame]: Runstatus and performance information of the algorithms. Simply the parsed ARFF file. See convertAlgoPerfToWideFormat for a more convenient format.
algo.runstatus [data.frame(n, k + 2)]: Runstatus of algorithm runs. The first 2 columns are “instance_id” and “repetition”, the remaining are the status factors. The step columns are in the same order as the feature steps in the description object. The factor levels are always: ok, presolved, crash, timeout, memout, other. No entry can be NA. The data.frame is sorted by “instance_id”, then “repetition”.
cv.splits[data.frame(m, 3)]: Definition of cross-validation splits for each replication of a repeated CV with folds. Has columns “instance_id”, “repetition” and “fold”. The instances with fold = i for a replication r constitute the i-th test set for the r-th CV. The training set is the “instance_id” column with repetition = r, in the same order, when the test set is removed. The data.frame is sorted by “repetition”, then “fold”, then “instance_id”. If no CV file is available at all, NULL is stored, and a warning is issued, although this should not happen.

Usage

parseASScenario(path)

Arguments

path

[character(1)]
Path to directory of benchmark data set.

Value

[ASScenario]. Description object.

Examples

## Not run: 
  sc = parseASScenario("/path/to/scenario")

## End(Not run)