cluster {llama} | R Documentation |
Cluster model
Description
Build a cluster model that predicts the algorithm to use based on the features of the problem.
Usage
cluster(clusterer = NULL, data = NULL,
bestBy = "performance",
pre = function(x, y=NULL) { list(features=x) },
save.models = NA)
Arguments
clusterer |
the mlr clustering function to use. See examples. The argument can also be a list of such functions. |
data |
the data to use with training and test sets. The structure returned by one of the partitioning functions. |
bestBy |
the criteria by which to determine the best algorithm in a cluster. Can be one of "performance", "count", "successes". Optional. Defaults to "performance". |
pre |
a function to preprocess the data. Currently only |
save.models |
Whether to serialize and save the models trained during evaluation of the
model. If not |
Details
cluster
takes data
and processes it using pre
(if
supplied). clusterer
is called to cluster the data. For each cluster, the
best algorithm is identified according to the criteria given in bestBy
.
If bestBy
is "performance", the best algorithm is the one with the best
overall performance across all instances in the cluster. If it is "count", the
best algorithm is the one that has the best performance most often. If it is
"successes", the best algorithm is the one with the highest number of successes
across all instances in the cluster. The learned model is used to cluster the
test data and predict algorithms accordingly.
The evaluation across the training and test sets will be parallelized
automatically if a suitable backend for parallel computation is loaded.
The parallelMap
level is "llama.fold".
If a list of clusterers is supplied in clusterer
, ensemble
clustering is performed. That is, the models are trained and used to make
predictions independently. For each instance, the final prediction is determined
by majority vote of the predictions of the individual models – the class that
occurs most often is chosen. If the list given as clusterer
contains a
member .combine
that is a function, it is assumed to be a classifier with
the same properties as classifiers given to classify
and will be used to
combine the ensemble predictions instead of majority voting. This classifier is
passed the original features and the predictions of the classifiers in the
ensemble.
If all predictions of an underlying machine learning model are NA
, the
prediction will be NA
for the algorithm and -Inf
for the score if
the performance value is to be maximised, Inf
otherwise.
If save.models
is not NA
, the models trained during evaluation are
serialized into files. Each file contains a list with members model
(the
mlr model), train.data
(the mlr task with the training data), and
test.data
(the data frame with the test data used to make predictions).
The file name starts with save.models
, followed by the ID of the machine
learning model, followed by "combined" if the model combines predictions of
other models, followed by the number of the fold. Each model for each fold is
saved in a different file.
Value
predictions |
a data frame with the predictions for each instance and test
set. The columns of the data frame are the instance ID columns (as determined
by |
predictor |
a function that encapsulates the model learned on the
entire data set. Can be called with data for the same features with the
same feature names as the training data to obtain predictions in the same
format as the |
models |
the list of models trained on the entire data set. This is meant for debugging/inspection purposes and does not include any models used to combine predictions of individual models. |
Author(s)
Lars Kotthoff
See Also
classify
, classifyPairs
, regression
,
regressionPairs
Examples
if(Sys.getenv("RUN_EXPENSIVE") == "true") {
data(satsolvers)
folds = cvFolds(satsolvers)
res = cluster(clusterer=makeLearner("cluster.XMeans"), data=folds, pre=normalize)
# the total number of successes
sum(successes(folds, res))
# predictions on the entire data set
res$predictor(satsolvers$data[satsolvers$features])
# determine best by number of successes
res = cluster(clusterer=makeLearner("cluster.XMeans"), data=folds,
bestBy="successes", pre=normalize)
sum(successes(folds, res))
# ensemble clustering
rese = cluster(clusterer=list(makeLearner("cluster.XMeans"),
makeLearner("cluster.SimpleKMeans"), makeLearner("cluster.EM")),
data=folds, pre=normalize)
# ensemble clustering with a classifier to combine predictions
rese = cluster(clusterer=list(makeLearner("cluster.XMeans"),
makeLearner("cluster.SimpleKMeans"), makeLearner("cluster.EM"),
.combine=makeLearner("classif.J48")), data=folds, pre=normalize)
}