kofnGA {kofnGA} | R Documentation |
Search for the best subset of size k from n choices.
Description
kofnGA
implements a genetic algorithm for subset selection. The function searches
for a subset of a fixed size, k, from the integers 1:n, such that user-supplied function
OF
is minimized at that subset. The selection step is done by tournament selection
based on ranks, and elitism may be used to retain the best solutions from one generation
to the next. Population objective function values can be evaluated in parallel.
Usage
kofnGA(n, k, OF, popsize = 200, keepbest = floor(popsize/10),
ngen = 500, tourneysize = max(ceiling(popsize/10), 2),
mutprob = 0.01, mutfrac = NULL, initpop = NULL, verbose = 0,
cluster = NULL, sharedmemory = FALSE, ...)
Arguments
n |
The maximum permissible index (i.e., the size of the set we are doing subset
selection from). The algorithm chooses a subset of integers from 1 to |
k |
The number of indices to choose (i.e., the size of the subset). |
OF |
The objective function. The first argument of OF should be an index vector of
length |
popsize |
The size of the population; equivalently, the number of offspring produced each generation. |
keepbest |
The |
ngen |
The number of generations to run. |
tourneysize |
The number of individuals involved in each tournament at the selection stage. |
mutprob |
The probability of mutation for each of the |
mutfrac |
The average fraction of offspring that will experience at least one mutation.
Equivalent to setting |
initpop |
A |
verbose |
An integer controlling the display of progress during search. If |
cluster |
If non-null, the objective function evaluations for each generation are done in
parallel. |
sharedmemory |
If |
... |
Additional arguments passed to |
Details
Tournament selection involves creating mating "tournaments" where two groups of
tourneysize
solutions are selected at random without regard to fitness. Within each tournament, victors are chosen by weighted sampling based on within-tournament fitness ranks (larger ranks given to more fit individuals). The two victors become parents of an offspring. This process is carried outpopsize
times to produce the new population.Crossover (reproduction) is carried out by combining the unique elements of both parents and keeping
k
of them, chosen at random.Increasing
tourneysize
will put more "selection pressure" on the choice of mating pairs, and will speed up convergence (to a local optimum) accordingly. Smallertourneysize
values will conversely promote better searching of the solution space.Increasing the size of the elite group (
keepbest
) also promotes more homogeneity in the population, thereby speeding up convergence.
Value
A list of S3 class "GAsearch" with the following elements:
bestsol |
A vector of length |
bestobj |
The objective function value for the best solution found. |
pop |
A |
obj |
The objective function values corresponding to each row of pop. |
old |
A list holding information about the search progress. Its elements are: |
old$best |
The sequence of best solutions known over the course of the search
(an |
old$obj |
The sequence of objective function values corresponding to the solutions
in |
old$avg |
The average population objective function value over the course of the
search (a vector of length |
Notes on parallel evaluation
Specifying a cluster
allows OF
to be evaluated over the population in parallel.
The population of solutions will be distributed among the workers in cluster
using
static dispatching. Any cluster produced by makeCluster
should work, though the sharedmemory
option is only appropriate for a cluster of
workers on the same multicore processor.
Solutions must be sent to workers (and results gathered back) once per generation. This
introduces communication overhead. Overhead can be reduced, but not eliminated, by setting
sharedmemory=TRUE
. The impact of parallelization on run time will depend on
how the run time cost of evaluationg OF
compares to the communication overhead. Test runs
are recommended to determine if parallel execution is beneficial in a given situation.
Note that only the objective function evaluations are distributed to the workers. Other parts of
the algorithm (mutation, crossover) are computed serially. As long as OF
is deterministic,
reproducibility of the results from a given random seed should not be affected by the use of
parallel computation.
References
Mark A. Wolters (2015), “A Genetic Algorithm for Selection of Fixed-Size Subsets, with Application to Design Problems,” Journal of Statistical Software, volume 68, Code Snippet 1, available online.
See Also
plot.GAsearch
plot method, print.GAsearch
print
method, and summary.GAsearch
summary method.
Examples
#---Find the four smallest numbers in a random vector of 100 uniforms---
# Generate the numbers and sort them so the best solution is (1,2,3,4).
Numbers <- sort(runif(100))
Numbers[1:6] #-View the smallest numbers.
ObjFun <- function(v, some_numbers) sum(some_numbers[v]) #-The objective function.
ObjFun(1:4, Numbers) #-The global minimum.
out <- kofnGA(n = 100, k = 4, OF = ObjFun, ngen = 50, some_numbers = Numbers) #-Run the GA.
summary(out)
plot(out)
## Not run:
# Note: the following two examples take tens of seconds to run (on a 2018 laptop).
#---Harder: find the 50x50 principal submatrix of a 500x500 matrix s.t. determinant is max---
# Use eigenvalue decomposition and QR decomposition to make a matrix with known eigenvalues.
n <- 500 #-Dimension of the matrix.
k <- 50 #-Size of subset to sample.
eigenvalues <- seq(10, 1, length.out=n) #-Choose the eigenvalues (all positive).
L <- diag(eigenvalues)
RandMat <- matrix(rnorm(n^2), nrow=n)
Q <- qr.Q(qr(RandMat))
M <- Q %*% L %*% t(Q)
M <- (M+t(M))/2 #-Enusre symmetry (fix round-off errors).
ObjFun <- function(v,Mat) -(determinant(Mat[v,v],log=TRUE)$modulus)
out <- kofnGA(n=n, k=k, OF=ObjFun, Mat=M)
print(out)
summary(out)
plot(out)
#---For interest: run GA searches iteratively (use initpop argument to pass results)---
# Alternate running with mutation probability 0.05 and 0.005, 50 generations each time.
# Use the same problem as just above (need to run that first).
mutprob <- 0.05
result <- kofnGA(n=n, k=k, OF=ObjFun, ngen=50, mutprob=mutprob, Mat=M) #-First run (random start)
allavg <- result$old$avg #-For holding population average OF values
allbest <- result$old$obj #-For holding population best OF values
for(i in 2:10) {
if(mutprob==0.05) mutprob <- 0.005 else mutprob <- 0.05
result <- kofnGA(n=n, k=k, OF=ObjFun, ngen=50, mutprob=mutprob, initpop=result$pop, Mat=M)
allavg <- c(allavg, result$old$avg[2:51])
allbest <- c(allbest, result$old$obj[2:51])
}
plot(0:500, allavg, type="l", col="blue", ylim=c(min(allbest), max(allavg)))
lines(0:500, allbest, col="red")
legend("topright", legend=c("Pop average", "Pop best"), col=c("blue", "red"), bty="n",
lty=1, cex=0.8)
summary(result)
## End(Not run)