regression {llama} | R Documentation |
Regression model
Description
Build a regression model that predicts the algorithm to use based on the features of the problem and optionally features of the algorithms.
Usage
regression(regressor = NULL, data = NULL,
pre = function(x, y=NULL) { list(features=x) },
combine = NULL, expand = identity, save.models = NA,
use.weights = TRUE)
Arguments
regressor |
the mlr regressor to use. See examples. |
data |
the data to use with training and test sets. The structure returned by one of the partitioning functions. |
pre |
a function to preprocess the data. Currently only |
combine |
the function used to combine the predictions of the individual regression
models for stacking. Default |
expand |
a function that takes a matrix of performance predictions (columns are
algorithms, rows problem instances) and transforms it into a matrix with the
same number of rows. Only meaningful if |
save.models |
Whether to serialize and save the models trained during evaluation of the
model. If not |
use.weights |
Whether to use instance weights if supported. Default |
Details
regression
takes data
and processes it using pre
(if
supplied). If no algorithm features are provided, regressor
is called to induce separate regression models for
each of the algorithms to predict its performance. When algorithm features are present,
regressor
is called to induce one regression model for all algorithms to predict their performance.
The best algorithm is
determined from the predicted performances by examining whether performance is
to be minimized or not, as specified when creating the data structure through
input
.
The evaluation across the training and test sets will be parallelized
automatically if a suitable backend for parallel computation is loaded.
The parallelMap
level is "llama.fold".
If combine
is not null, it is assumed to be an mlr classifier and will be
used to learn a model to predict the best algorithm given the original features
and the performance predictions for the individual algorithms. combine
option
is currently not supported with algorithm features. If this
classifier supports weights and use.weights
is TRUE
, they will be
passed as the difference between the best and the worst algorithm. Optionally,
expand
can be used to supply a function that will modify the predictions
before giving them to the classifier, e.g. augment the performance predictions
with the pairwise differences (see examples).
If all predictions of an underlying machine learning model are NA
, the
prediction will be NA
for the algorithm and -Inf
for the score if
the performance value is to be maximised, Inf
otherwise.
If save.models
is not NA
, the models trained during evaluation are
serialized into files. Each file contains a list with members model
(the
mlr model), train.data
(the mlr task with the training data), and
test.data
(the data frame with the test data used to make predictions).
The file name starts with save.models
, followed by the ID of the machine
learning model, followed by "combined" if the model combines predictions of
other models, followed by the number of the fold. Each model for each fold is
saved in a different file.
Value
predictions |
a data frame with the predictions for each instance and test
set. The columns of the data frame are the instance ID columns (as determined
by |
predictor |
a function that encapsulates the regression model learned on
the entire data set. Can be called with data for the same features with
the same feature names as the training data to obtain predictions in the same
format as the |
models |
the list of models trained on the entire data set. This is meant for debugging/inspection purposes and does not include any models used to combine predictions of individual models. |
Author(s)
Lars Kotthoff
References
Kotthoff, L. (2012) Hybrid Regression-Classification Models for Algorithm Selection. 20th European Conference on Artificial Intelligence, 480–485.
See Also
classify
, classifyPairs
, cluster
,
regressionPairs
Examples
if(Sys.getenv("RUN_EXPENSIVE") == "true") {
data(satsolvers)
folds = cvFolds(satsolvers)
res = regression(regressor=makeLearner("regr.lm"), data=folds)
# the total number of successes
sum(successes(folds, res))
# predictions on the entire data set
res$predictor(satsolvers$data[satsolvers$features])
res = regression(regressor=makeLearner("regr.ksvm"), data=folds)
# combine performance predictions using classifier
ress = regression(regressor=makeLearner("regr.ksvm"),
data=folds,
combine=makeLearner("classif.J48"))
# add pairwise differences to performance predictions before running classifier
ress = regression(regressor=makeLearner("regr.ksvm"),
data=folds,
combine=makeLearner("classif.J48"),
expand=function(x) { cbind(x, combn(c(1:ncol(x)), 2,
function(y) { abs(x[,y[1]] - x[,y[2]]) })) })
}