GenAlg-tools {GenAlgo} | R Documentation |
Utility functions for selection and mutation in genetic algorithms
Description
These functions implement specific forms of mutation and fitness that can be used in genetic algorithms for feature selection.
Usage
simpleMutate(allele, context)
selectionMutate(allele, context)
selectionFitness(arow, context)
Arguments
allele |
In the |
arow |
A vector of integer indices identifying the rows (features) to be
selected from the |
context |
A list or data frame containing auxiliary information that is needed
to resolve references from the mutation or fitness code. In both
|
Details
These functions represent 'callbacks'. They can be used in the
function GenAlg
, which creates objects. They will then
be called repeatedly (for each individual in the population) each time
the genetic algorithm is updated to the next generation.
The simpleMutate
function assumes that chromosomes are binary
vectors, so alleles simply take on the value 0 or 1. A mutation of an
allele, therefore, flips its state between those two possibilities.
The selectionMutate
and selectionFitness
functions, by
contrast, are specialized to perform feature selection assuming a
fixed number K of features, with a goal of learning how to
distinguish between two different groups of samples. We assume that
the underlying data consists of a data frame (or matrix), with the
rows representing features (such as genes) and the columns
representing samples. In addition, there must be a grouping vector
(or factor) that assigns all of the sample columns to one of two
possible groups. These data are collected into a list,
context
, containing a dataset
matrix and a gps
factor. An individual member of the population of potential
solutions is encoded as a length K vector of indices into the rows
of the dataset
. An individual allele
, therefore, is a
single index identifying a row of the dataset
. When mutating
it, we assume that it can be changed into any other possible allele;
i.e., any other row number. To compute the fitness, we use the
Mahalanobis distance between the centers of the two groups defined by
the gps
factor.
Value
Both selectionMutate
and simpleMutate
return an integer
value; in the simpler case, the value is guaranteed to be a 0 or 1.
The selectionFitness
function returns a real number.
Author(s)
Kevin R. Coombes krc@silicovore.com, P. Roebuck proebuck@mdanderson.org
See Also
Examples
# generate some fake data
nFeatures <- 1000
nSamples <- 50
fakeData <- matrix(rnorm(nFeatures*nSamples), nrow=nFeatures, ncol=nSamples)
fakeGroups <- sample(c(0,1), nSamples, replace=TRUE)
myContext <- list(dataset=fakeData, gps=fakeGroups)
# initialize population
n.individuals <- 200
n.features <- 9
y <- matrix(0, n.individuals, n.features)
for (i in 1:n.individuals) {
y[i,] <- sample(1:nrow(fakeData), n.features)
}
# set up the genetic algorithm
my.ga <- GenAlg(y, selectionFitness, selectionMutate, myContext, 0.001, 0.75)
# advance one generation
my.ga <- newGeneration(my.ga)