R: Regression model

regression {llama}

R Documentation

Regression model

Description

Build a regression model that predicts the algorithm to use based on the features of the problem and optionally features of the algorithms.

Usage

regression(regressor = NULL, data = NULL,
    pre = function(x, y=NULL) { list(features=x) },
    combine = NULL, expand = identity, save.models = NA,
    use.weights = TRUE)

Arguments

`regressor`	the mlr regressor to use. See examples.
`data`	the data to use with training and test sets. The structure returned by one of the partitioning functions.
`pre`	a function to preprocess the data. Currently only `normalize`. Optional. Does nothing by default.
`combine`	the function used to combine the predictions of the individual regression models for stacking. Default `NULL`. See details.
`expand`	a function that takes a matrix of performance predictions (columns are algorithms, rows problem instances) and transforms it into a matrix with the same number of rows. Only meaningful if `combine` is not null. Default is the identity function, which will leave the matrix unchanged. See examples.
`save.models`	Whether to serialize and save the models trained during evaluation of the model. If not `NA`, will be used as a prefix for the file name.
`use.weights`	Whether to use instance weights if supported. Default `TRUE`.

Details

regression takes data and processes it using pre (if supplied). If no algorithm features are provided, regressor is called to induce separate regression models for each of the algorithms to predict its performance. When algorithm features are present, regressor is called to induce one regression model for all algorithms to predict their performance. The best algorithm is determined from the predicted performances by examining whether performance is to be minimized or not, as specified when creating the data structure through input.

The evaluation across the training and test sets will be parallelized automatically if a suitable backend for parallel computation is loaded. The parallelMap level is "llama.fold".

If combine is not null, it is assumed to be an mlr classifier and will be used to learn a model to predict the best algorithm given the original features and the performance predictions for the individual algorithms. combine option is currently not supported with algorithm features. If this classifier supports weights and use.weights is TRUE, they will be passed as the difference between the best and the worst algorithm. Optionally, expand can be used to supply a function that will modify the predictions before giving them to the classifier, e.g. augment the performance predictions with the pairwise differences (see examples).

If all predictions of an underlying machine learning model are NA, the prediction will be NA for the algorithm and -Inf for the score if the performance value is to be maximised, Inf otherwise.

If save.models is not NA, the models trained during evaluation are serialized into files. Each file contains a list with members model (the mlr model), train.data (the mlr task with the training data), and test.data (the data frame with the test data used to make predictions). The file name starts with save.models, followed by the ID of the machine learning model, followed by "combined" if the model combines predictions of other models, followed by the number of the fold. Each model for each fold is saved in a different file.

Value

`predictions`	a data frame with the predictions for each instance and test set. The columns of the data frame are the instance ID columns (as determined by `input`), the algorithm, the score of the algorithm, and the iteration (e.g. the number of the fold for cross-validation). More than one prediction may be made for each instance and iteration. The score corresponds to the predicted performance value. If stacking is used, each prediction is simply the best algorithm with a score of 1.
`predictor`	a function that encapsulates the regression model learned on the entire data set. Can be called with data for the same features with the same feature names as the training data to obtain predictions in the same format as the `predictions` member.
`models`	the list of models trained on the entire data set. This is meant for debugging/inspection purposes and does not include any models used to combine predictions of individual models.

Author(s)

Lars Kotthoff

References

Kotthoff, L. (2012) Hybrid Regression-Classification Models for Algorithm Selection. 20th European Conference on Artificial Intelligence, 480–485.

Examples

if(Sys.getenv("RUN_EXPENSIVE") == "true") {
data(satsolvers)
folds = cvFolds(satsolvers)

res = regression(regressor=makeLearner("regr.lm"), data=folds)
# the total number of successes
sum(successes(folds, res))
# predictions on the entire data set
res$predictor(satsolvers$data[satsolvers$features])

res = regression(regressor=makeLearner("regr.ksvm"), data=folds)

# combine performance predictions using classifier
ress = regression(regressor=makeLearner("regr.ksvm"),
                  data=folds,
                  combine=makeLearner("classif.J48"))

# add pairwise differences to performance predictions before running classifier
ress = regression(regressor=makeLearner("regr.ksvm"),
                  data=folds,
                  combine=makeLearner("classif.J48"),
                  expand=function(x) { cbind(x, combn(c(1:ncol(x)), 2,
                         function(y) { abs(x[,y[1]] - x[,y[2]]) })) })
}

[Package llama version 0.10.1 Index]