R: Tune the hyperparameters of the machine learning algorithm...

tune {llama}

R Documentation

Tune the hyperparameters of the machine learning algorithm underlying a model

Description

Functions to tune the hyperparameters of the machine learning algorithm underlying a model with respect to a performance measure.

Usage

tuneModel(ldf, llama.fun, learner, design, metric = parscores, nfolds = 10L,
    quiet = FALSE)

Arguments

`ldf`	the LLAMA data to use. The structure returned by `input`.
`llama.fun`	the LLAMA model building function.
`learner`	the mlr learner to use.
`design`	the data frame denoting the parameter values to try. Can be produced with the `ParamHelpers` package. See examples.
`metric`	the metric used to evaluate the model. Can be one of `misclassificationPenalties`, `parscores` or `successes`.
`nfolds`	the number of folds. Defaults to 10. If -1 is given, leave-one-out cross-validation folds are produced.
`quiet`	whether to output information on the intermediate values and progress during tuning.

Details

tuneModel finds the hyperparameters from the set denoted by design of the machine learning algorithm learner that give the best performance with respect to the measure metric for the LLAMA model type llama.fun on data ldf. It uses a nested cross-validation internally; the number of inner folds is given through nfolds, the number of outer folds is either determined by any existing partitions of ldf or, if none are present, by nfolds as well.

During each iteration of the inner cross-validation, all parameter sets specified in design are evaluated and the one with the best performance value chosen. The mean performance over all instances in the data is logged for all evaluations. This parameter set is then used to build and evaluate a model in the outer cross-validation. The predictions made by this model along with the parameter values used to train it are returned.

Finally, a normal (not-nested) cross-validation is performed to find the best parameter values on the entire data set. The predictor of this model along with the parameter values used to train it is returned. The interface corresponds to the normal LLAMA model-building functions in that respect – the returned data structure is the same with a few additional values.

The evaluation across the folds sets will be parallelized automatically if a suitable backend for parallel computation is loaded. The parallelMap level is "llama.tune".

Value

`predictions`	a data frame with the predictions for each instance and test set. The structure is the same as for the underlying model building function and the predictions are the ones made by the models trained with the best parameter values for the respective fold.
`predictor`	a function that encapsulates the classifier learned on the entire data set with the best parameter values determined on the entire data set. Can be called with data for the same features with the same feature names as the training data to obtain predictions in the same format as the `predictions` member.
`models`	the list of models trained on the entire data set. This is meant for debugging/inspection purposes.
`parvals`	the best parameter values on the entire data set used for training the `predictor` model.
`inner.parvals`	the best parameter values during each iteration of the outer cross-validation. These parameters were used to train the models that made the predictions in `predictions`.

Author(s)

Bernd Bischl, Lars Kotthoff

Examples

if(Sys.getenv("RUN_EXPENSIVE") == "true") {
library(ParamHelpers)
data(satsolvers)

learner = makeLearner("classif.J48")
# parameter set for J48
ps = makeParamSet(makeIntegerParam("M", lower = 1, upper = 100))
# generate 10 random parameter sets
design = generateRandomDesign(10, ps)
# tune with respect to PAR10 score (default) with 10 outer and inner folds
# (default)
res = tuneModel(satsolvers, classify, learner, design)
}

[Package llama version 0.10.1 Index]