R: Multiobjective Genetic Algorithm for Variable Selection

mogavs {mogavs}

R Documentation

Multiobjective Genetic Algorithm for Variable Selection

Description

The main function for the mogavs genetic algorithm, returning a list containing the full archive set of regression models tried and the nondominated set.

Usage

## Default S3 method:
mogavs(x, y, maxGenerations = 10*ncol(x), popSize = ncol(x), noOfOffspring = ncol(x),
crossoverProbability = 0.9, mutationProbability = 1/ncol(x), kBest = 1, 
plots = F, additionalPlots = F, ...)
## S3 method for class 'formula'
mogavs(formula, data, maxGenerations= 10*ncol(x), popSize = ncol(x), 
noOfOffspring = ncol(x), crossoverProbability = 0.9, mutationProbability = 1/ncol(x), 
kBest = 1, plots = F, additionalPlots = F, ...)

Arguments

`formula`	Formula interface with y~x1+x2 or y~. for predicting y with x1 and x2 or all predictors, respectively.
`data`	A data frame containing the variables mentioned in the formula.
`x`	An n x p matrix containing the n observations of p values used in the regression.
`y`	An n x 1 vector of values to fit the regression to.
`maxGenerations`	Number of maximum generations to be run in the evolutionary algorithm. Default is 10*ncol(x)
`popSize`	Population size, ie. how many regression models the population holds. Default is ncol(x).
`noOfOffspring`	Indicates how many offspring models are generated for each generation. Default is ncol(x).
`crossoverProbability`	Indicates the probability of crossover for each offpring. Default is 0.9.
`mutationProbability`	Indicates the probability of mutation for each offspring. Default is 1/ncol(x).
`kBest`	Indicates how many best models for each number of variables are highlighted in printing at the end of the run (default=1).
`plots`	Binary variable for turning plotting for each generation on/off.
`additionalPlots`	Binary variable for turning additional plotting at the end of the run on/off. Plot can also be generated after the run with given `createAdditionalPlots` functions.
`...`	Any additional arguments.

Details

Runs genetic algorithm for the linear regression model space, with predicting variables x and predicted values y. Alternatively, can be given a data frame and formula. Setting plots=TRUE creates for each generation a plot, showing the current efficient boundary of the models. Setting additionalPlots=TRUE gives out an additional plot at the end of the algorithm, showing the full set of tried models and the kBest best models for each number of variables. All plotting is turned off by default to make processing faster.

Value

Returns model of class mogavs with items

`nonDominatedSet`	Matrix of the nondominated models.
`numOfVariables`	Vector of the number of variables for each model in the nonDominatedSet.
`MSE`	Vector of mean square errors for each model in the nonDominatedSet.
`archiveSet`	The full archive set of models tried
`kBest`	The value of kBest used
`maxGenerations`	Number of generations used.
`crossoverProbability`	The crossover probability used.
`noOfOffspring`	Number of generated offspring for each generation.
`popSize`	The population size.

Author(s)

Tommi Pajala <tommi.pajala@aalto.fi>

References

Sinha, A., Malo, P. & Kuosmanen, T. (2015) A Multi-objective Exploratory Procedure for Regression Model Selection. Journal of Computational and Grahical Statistics, 24(1). pp. 154-182.

Examples

data(sampleData)
#just a few generations to keep test fast
mogavs(y~.,data=sampleData,maxGenerations=5)

#with a more sensible number of generations, with all plotting on
## Not run: mogavs(y~.,data=sampleData,maxGenerations=100,plots=TRUE,additionalPlots=TRUE)

[Package mogavs version 1.1.0 Index]