R: Generic function for multi-objective parameter tuning of...

tunePareto {TunePareto}

R Documentation

Generic function for multi-objective parameter tuning of classifiers

Description

This generic function tunes parameters of arbitrary classifiers in a multi-objective setting and returns the Pareto-optimal parameter combinations.

Usage

tunePareto(..., data, labels, 
           classifier, parameterCombinations,
           sampleType = c("full","uniform",
                          "latin","halton",
                          "niederreiter","sobol",
                          "evolution"), 
           numCombinations, 
           mu=10, lambda=20, numIterations=100,
           objectiveFunctions, objectiveBoundaries,
           keepSeed = TRUE, useSnowfall = FALSE, verbose=TRUE)

Arguments

`data`	The data set to be used for the parameter tuning. This is usually a matrix or data frame with the samples in the rows and the features in the columns.
`labels`	A vector of class labels for the samples in `data`.
`classifier`	A `TuneParetoClassifier` wrapper object containing the classifier to tune. A number of state-of-the-art classifiers are included in TunePareto (see `predefinedClassifiers`). Custom classifiers can be employed using `tuneParetoClassifier`.
`parameterCombinations`	If not all combinations of parameter ranges for the classifier are meaningful, you can set this parameter instead of specifying parameter values in the ... argument. It holds an explicit list of possible combinations, where each element of the list is a named sublist with one entry for each parameter.
`sampleType`	Determines the way parameter configurations are sampled. If `type="full"`, all possible combinations are tried. This is only possible if all supplied parameter ranges are discrete or if the combinations are supplied explicitly in `parameterCombinations`. If `type="uniform"`, `numCombinations` combinations are drawn uniformly at random. If `type="latin"`, Latin Hypercube sampling is applied. This is particularly encouraged when tuning using continuous parameters. If `type="halton","niederreiter","sobol"`, `numCombinations` parameter combinations are drawn on the basis of the corresponding quasi-random sequences (initialized at a random step to ensure that different values are drawn in repeated runs). This is particularly encouraged when tuning using continuous parameters. `type="niederreiter"` and `type="sobol"` require the gsl package to be installed. If `type="evolution"`, an evolutionary algorithm is applied. In details, this employs `mu+lambda` Evolution Strategies with uncorrelated mutations and non-dominated sorting for survivor selection. This is encouraged when the space of possible parameter configurations is very large. For smaller parameter spaces, the above sampling methods may be faster.
`numCombinations`	If this parameter is set, at most `numCombinations` randomly chosen parameter configurations are tested. Otherwise, all possible combinations of the supplied parameter ranges are tested.
`mu`	The number of individuals used in the Evolution Strategies if `type="evolution"`.
`lambda`	The number of offspring per generation in the Evolution Strategies if `type="evolution"`.
`numIterations`	The number of iterations/generations the evolutionary algorithm is run if `type="evolution"`.
`objectiveFunctions`	A list of objective functions used to tune the parameters. There are a number of predefined objective functions (see `predefinedObjectiveFunctions`). Custom objective functions can be created using `createObjective`.
`objectiveBoundaries`	If this parameter is set, it specifies boundaries of the objective functions for valid solutions. That is, each element of the supplied vector specifies the upper or lower limit of an objective (depending on whether the objective is maximized or minimized). Parameter combinations that do not meet all these restrictions are not included in the result set, even if they are Pareto-optimal. If only some of the objectives should have bounds, supply `NA` for the remaining objectives.
`keepSeed`	If this is true, the random seed is reset to the same value for each of the tested parameter configurations. This is an easy way to guarantee comparability in randomized objective functions. E.g., cross-validation runs of the classifiers will all start with the same seed, which results in the same partitions. Attention: If you set this parameter to `FALSE`, you must ensure that all configuration are treated equally in the objective functions: There may be randomness in processes such as classifier training, but there should be no random difference in the rating itself. In particular, the choice of subsets for subsampling experiments should always be the same for all configurations. For example, you can provide precalculated fold lists to the cross-validation objectives in the `foldList` parameter. If parameter configurations are rated under varying conditions, this may yield over-optimistic or over-pessimistic ratings for some configurations due to outliers.
`useSnowfall`	If this parameter is true, the routine loads the snowfall package and processes the parameter configurations in parallel. Please note that the snowfall cluster has to be initialized properly before running the tuning function and stopped after the run.
`verbose`	If this parameter is true, status messages are printed. In particular, the algorithm prints the currently tested combination.
`...`	The parameters of the classifier and predictor functions that should be tuned. The names of the parameters must correspond to the parameters specified in `classifierParameterNames` and `predictorParameterNames` of `tuneParetoClassifier`. Each supplied argument describes the possible values of a single parameter. These can be specified in two ways: Discrete parameter ranges are specified as lists of possible values. Continous parameter ranges are specified as intervals using `as.interval`. The algorithm then generates combinations of possible parameter values. Alternatively, the combinations can be defined explicitly using the `parameterCombinations` parameter.

Details

This is a generic function that allows for parameter tuning of a wide variety of classifiers. You can either specify the values or intervals of tuned parameters in the ... argument, or supply selected combinations of parameter values using parameterCombinations. In the first case, combinations of parameter values specified in the ... argument are generated. If sampleType="uniform", sampleType="latin", sampleType="halton", sampleType="niederreiter" or sampleType="sobol", a random subset of the possible combinations is drawn. If sampleType="evolution", random parameter combinations are generated and optimized using Evolution Strategies.

In the latter case, only the parameter combinations specified explicitly in parameterCombinations are tested. This is useful if certain parameter combinations are invalid. You can create parameter combinations by concatenating results of calls to allCombinations. Only sampleType="full" is allowed in this mode.

For each of the combinations, the specified objective functions are calculated. This usually involves training and testing a classifier. From the resulting objective values, the non-dominated parameter configurations are calculated and returned.

The ... argument is the first argument of tunePareto for technical reasons (to prevent partial matching of the supplied parameters with argument names of tunePareto. This requires all arguments to be named.

Value

Returns a list of class TuneParetoResult with the following components:

`bestCombinations`	A list of Pareto-optimal parameter configurations. Each element of the list consists of a sub-list with named elements corresponding to the parameter values.
`bestObjectiveValues`	A matrix containing the objective function values of the Pareto-optimal configurations in `bestCombinations`. Each row corresponds to a parameter configuration, and each column corresponds to an objective function.
`testedCombinations`	A list of all tested parameter configurations with the same structure as `bestCombinations`.
`testedObjectiveValues`	A matrix containing the objective function values of all tested configurations with the same structure as `bestObjectiveValues`.
`dominationMatrix`	A Boolean matrix specifying which parameter configurations dominate each other. If a configuration `i` dominates a configuration `j`, the entry in the `i`th row and the `j`th column is `TRUE`.
`minimizeObjectives`	A Boolean vector specifying which of the objectives are minimization objectives. This is derived from the objective functions supplied to `tunePareto`.
`additionalData`	A list containing additional data that may have been returned by the objective functions. The list has one element for each tested parameter configuration, each comprising one sub-element for each objective function that returned additional data. The structure of these sub-elements depends on the corresponding objective function. For example, the predefined objective functions (see `predefinedObjectiveFunctions`) save the trained models here if `saveModel` is true.

Examples



# tune 'k' of a k-NN classifier 
# on two classes of the 'iris' data set --
# see ?knn
print(tunePareto(data = iris[, -ncol(iris)], 
                 labels = iris[, ncol(iris)],
                 classifier = tunePareto.knn(), 
                 k = c(1,3,5,7,9),
                 objectiveFunctions = list(cvError(10, 10),
                                           reclassError())))
                 
# example using predefined parameter configurations,
# as certain combinations of k and l are invalid:
comb <- c(allCombinations(list(k=1,l=0)),
          allCombinations(list(k=3,l=0:2)),
          allCombinations(list(k=5,l=0:4)),
          allCombinations(list(k=7,l=0:6)))

print(tunePareto(data = iris[, -ncol(iris)], 
                 labels = iris[, ncol(iris)],
                 classifier = tunePareto.knn(), 
                 parameterCombinations = comb,
                 objectiveFunctions = list(cvError(10, 10),
                                           reclassError())))
                                           

# tune 'cost' and 'kernel' of an SVM on
# the 'iris' data set using Latin Hypercube sampling --
# see ?svm and ?predict.svm
print(tunePareto(data = iris[, -ncol(iris)], 
                 labels = iris[, ncol(iris)],
                 classifier = tunePareto.svm(), 
                 cost = as.interval(0.001,10), 
                 kernel = c("linear", "polynomial",
                          "radial", "sigmoid"),
                 sampleType="latin",
                 numCombinations=20,                          
                 objectiveFunctions = list(cvError(10, 10),
                                           cvSensitivity(10, 10, caseClass="setosa"))))

# tune the same parameters using Evolution Strategies
print(tunePareto(data = iris[, -ncol(iris)], 
                 labels = iris[, ncol(iris)],
                 classifier = tunePareto.svm(), 
                 cost = as.interval(0.001,10), 
                 kernel = c("linear", "polynomial",
                          "radial", "sigmoid"),
                 sampleType="evolution",
                 numCombinations=20,
                 numIterations=20,                      
                 objectiveFunctions = list(cvError(10, 10),
                                           cvSensitivity(10, 10, caseClass="setosa"),
                                           cvSpecificity(10, 10, caseClass="setosa"))))

[Package TunePareto version 2.5.3 Index]