FUGEPSD {SDEFSR} | R Documentation |
Fuzzy Genetic Programming-based learning for Subgroup Discovery (FuGePSD) Algorithm.
Description
Make a subgroup discovery task using the FuGePSD algorithm.
Usage
FUGEPSD(
paramFile = NULL,
training = NULL,
test = NULL,
output = c("optionsFile.txt", "rulesFile.txt", "testQM.txt"),
seed = 0,
nLabels = 3,
t_norm = "product_t-norm",
ruleWeight = "Certainty_Factor",
frm = "Normalized_Sum",
numGenerations = 300,
numberOfInitialRules = 100,
crossProb = 0.5,
mutProb = 0.2,
insProb = 0.15,
dropProb = 0.15,
tournamentSize = 2,
globalFitnessWeights = c(0.7, 0.1, 0.05, 0.2),
minCnf = 0.6,
ALL_CLASS = TRUE,
targetVariable = NA
)
Arguments
paramFile |
The path of the parameters file. |
training |
A |
test |
A |
output |
Character vector with the paths where store information file, rules file and test quality measures file, respectively. For rules and quality measures files, the algorithm generate 4 files, each one with the results of a given filter of fuzzy confidence. |
seed |
An integer to set the seed used for generate random numbers. |
nLabels |
Number of linguistic labels for numerical variables. By default 3. We recommend an odd number between 3 and 9. |
t_norm |
A string with the t-norm to use when computing the compatibilty degree of the rules. Use |
ruleWeight |
String with the method to calculate the rule weight. Possible values are:
|
frm |
A string specifying the Fuzzy Reasoning Method to use. Possible Values are:
|
numGenerations |
An integer to set the number of generations to perfom before stop the evolutionary process. |
numberOfInitialRules |
An integer to set the number individuals or rules in the initial population. |
crossProb |
Sets the crossover probability. We recommend a number in [0,1]. |
mutProb |
Sets the mutation probability. We recommend a number in [0,1]. |
insProb |
Sets the insertion probability. We recommend a number in [0,1]. |
dropProb |
Sets the dropping probability. We recommend a number in [0,1]. |
tournamentSize |
Sets the number of individuals that will be chosen in the tournament selection procedure. This number must be greater than or equal to 2. |
globalFitnessWeights |
A numeric vector of length 4 specifying the weights used in the computation of the Global Fitness Parameter. |
minCnf |
A value in [0,1] to filter rules with a minimum confidence |
ALL_CLASS |
if TRUE, the algorithm returns, at least, the best rule for each target class, even if it does not pass the filters. If FALSE, it only returns, at least, the best rule if there are not rules that passes the filters. |
targetVariable |
The name or index position of the target variable (or class). It must be a categorical one. |
Details
This function sets as target variable the last one that appear in SDEFSR_Dataset
object. If you want
to change the target variable, you can set the targetVariable
to change this target variable.
The target variable MUST be categorical, if it is not, throws an error. Also, the default behaviour is to find
rules for all possible values of the target varaible. targetClass
sets a value of the target variable where the
algorithm only finds rules about this value.
If you specify in paramFile
something distinct to NULL
the rest of the parameters are
ignored and the algorithm tries to read the file specified. See "Parameters file structure" below
if you want to use a parameters file.
@return The algorithm shows in console the following results:
Information about the parameters used in the algorithm.
Results for each filter:
Rules generated that passes the filter.
The test quality measures for each rule in that filter.
Also, this results are saved in a file with rules and other with the quality measures, one file per filter.
@section How does this algorithm work?: This algorithm performs a EFS based on a genetic programming algorithm. This algorithm starts with an initial population generated in a random manner where individuals are represented through the "chromosome = individual" approach includind both antecedent and consequent of the rule. The representation of the consequent has the advantage of getting rules for all target class with only one execution of the algorithm.
The algorithm employs a cooperative-competition approach were rules of the population cooperate and compete between them in order to obtain the optimal solution. So this algorithm performs to evaluation, one for individual rules to competition and other for the total population for cooperation.
The algorithm evolves generating an offspring population of the same size than initial generated by the application of the genetic operators over the main population. Once applied, both populations are joined a token competition is performed in order to maintain the diversity of the rules generated. Also, this token competition reduce the population sice deleting those rules that are not competitive.
After the evolutionary process a screening function is applied over the best population. This screening function filter the rules that have a minimum level of confidence and sensitivity. Those levels are 0.6 for sensitivy and four filters of 0.6, 0.7, 0.8 and 0.9 for fuzzy confidence are performed.
Also, the user can force the algorithm return at least one rule for all target class values, even if not pass the screening function. This behaviour is specified by the ALL_CLASS parameter.
@section Parameters file structure:
The paramFile
argument points to a file which has the neccesary parameters to execute FuGePSD.
This file must be, at least, this parameters (separated by a carriage return):
-
algorithm
Specify the algorithm to execute. In this case. "MESDIF" -
inputData
Specify two paths of KEEL files for training and test. In case of specify only the name of the file, the path will be the working directory. -
seed
Sets the seed for the random number generator -
nLabels
Sets the number of fuzzy labels to create when reading the files -
nEval
Set the maximum number of evaluations of rules for stop the genetic process -
popLength
Sets number of individuals of the main population -
eliteLength
Sets number of individuals of the elite population. Must be less thanpopLength
-
crossProb
Crossover probability of the genetic algorithm. Value in [0,1] -
mutProb
Mutation probability of the genetic algorithm. Value in [0,1] -
Obj1
Sets the objetive number 1. -
Obj2
Sets the objetive number 2. -
Obj3
Sets the objetive number 3. -
Obj4
Sets the objetive number 4. -
RulesRep
Representation of each chromosome of the population. "can" for canonical representation. "dnf" for DNF representation. -
targetClass
Value of the target variable to search for subgroups. The target variable is always the last variable. Usenull
to search for every value of the target variable
An example of parameter file could be:
algorithm = FUGEPSD inputData = "banana-5-1tra.dat" "banana-5-1tst.dat" outputData = "Parameters_INFO.txt" "Rules.txt" "TestMeasures.txt" seed = 23783 Number of Labels = 3 T-norm/T-conorm for the Computation of the Compatibility Degree = Normalized_Sum Rule Weight = Certainty_Factor Fuzzy Reasoning Method = Normalized_Sum Number of Generations = 300 Initial Number of Fuzzy Rules = 100 Crossover probability = 0.5 Mutation probability = 0.2 Insertion probability = 0.15 Dropping Condition probability = 0.15 Tournament Selection Size = 2 Global Fitness Weight 1 = 0.7 Global Fitness Weight 2 = 0.1 Global Fitness Weight 3 = 0.05 Global Fitness Weight 4 = 0.2 All Class = true
Author(s)
Written on R by Angel M. Garcia <amgv0009@red.ujaen.es>
References
A fuzzy genetic programming-based algorithm for subgroup discovery and the application to one problem of pathogenesis of acute sore throat conditions in humans, Carmona, C.J., Ruiz-Rodado V., del Jesus M.J., Weber A., Grootveld M., Gonzalez P., and Elizondo D. , Information Sciences, Volume 298, p.180-197, (2015)
Examples
FUGEPSD(training = habermanTra,
test = habermanTst,
output = c(NA, NA, NA),
seed = 23783,
nLabels = 3,
t_norm = "Minimum/Maximum",
ruleWeight = "Certainty_Factor",
frm = "Normalized_Sum",
numGenerations = 20,
numberOfInitialRules = 15,
crossProb = 0.5,
mutProb = 0.2,
insProb = 0.15,
dropProb = 0.15,
tournamentSize = 2,
globalFitnessWeights = c(0.7, 0.1, 0.3, 0.2),
ALL_CLASS = TRUE)
## Not run:
# Execution with a parameters file called 'ParamFile.txt' in the working directory:
FUGEPSD("ParamFile.txt")
## End(Not run)