NMEEF_SD {SDEFSR} | R Documentation |
Non-dominated Multi-objective Evolutionary algorithm for Extracting Fuzzy rules in Subgroup Discovery (NMEEF-SD)
Description
Perfoms a subgroup discovery task executing the algorithm NMEEF-SD
Usage
NMEEF_SD(
paramFile = NULL,
training = NULL,
test = NULL,
output = c("optionsFile.txt", "rulesFile.txt", "testQM.txt"),
seed = 0,
nLabels = 3,
nEval = 10000,
popLength = 100,
mutProb = 0.1,
crossProb = 0.6,
Obj1 = "CSUP",
Obj2 = "CCNF",
Obj3 = "null",
minCnf = 0.6,
reInitCoverage = "yes",
porcCob = 0.5,
StrictDominance = "yes",
targetVariable = NA,
targetClass = "null"
)
Arguments
paramFile |
The path of the parameters file. |
training |
A |
test |
A |
output |
character vector with the paths of where store information file, rules file and test quality measures file, respectively. |
seed |
An integer to set the seed used for generate random numbers. |
nLabels |
Number of linguistic labels for numerical variables. |
nEval |
An integer for set the maximum number of evaluations in the evolutionary process. |
popLength |
An integer to set the number of individuals in the population. |
mutProb |
Sets the mutation probability. A number in [0,1]. |
crossProb |
Sets the crossover probability. A number in [0,1]. |
Obj1 |
Sets the Objective number 1. See |
Obj2 |
Sets the Objective number 2. See |
Obj3 |
Sets the Objective number 3. See |
minCnf |
Sets the minimum confidence that must have a rule in the Pareto front for being returned. A number in [0,1]. |
reInitCoverage |
Sets if the algorithm must perform the reinitialitation based on coverage when it is needed. A string with "yes" or "no". |
porcCob |
Sets the maximum percentage of variables that participate in the rules generated in the reinitialitation based on coverage. A number in [0,1] |
StrictDominance |
Sets if the comparison between individuals must be done by strict dominance or not. A string with "yes" or "no". |
targetVariable |
The name or index position of the target variable (or class). It must be a categorical one. |
targetClass |
A string specifing the value the target variable. |
Details
This function sets as target variable the last one that appear in SDEFSR_Dataset
object. If you want
to change the target variable, you can set the targetVariable
to change this target variable.
The target variable MUST be categorical, if it is not, throws an error. Also, the default behaviour is to find
rules for all possible values of the target varaible. targetClass
sets a value of the target variable where the
algorithm only finds rules about this value.
If you specify in paramFile
something distinct to NULL
the rest of the parameters are
ignored and the algorithm tries to read the file specified. See "Parameters file structure" below
if you want to use a parameters file.
Value
The algorithm shows in the console the following results:
The parameters used in the algorithm
The rules generated.
The quality measures for test of every rule and the global results.
Also, the algorithms save those results in the files specified in the
output
parameter of the algorithm or in theoutputData
parameter in the parameters file.
How does this algorithm work?
NMEEF-SD is a multiobjetctive genetic algorithm based on a NSGA-II approach. The algorithm first makes a selection based on binary tournament and save the individuals in a offspring population. Then, NMEEF-SD apply the genetic operators over individuals in offspring population
For generate the population which participate in the next iteration of the evolutionary process NMEEF-SD calculate the dominance among all individuals (join main population and offspring) and then, apply the NSGA-II fast sort algorithm to order the population by fronts of dominance, the first front is the non-dominated front (or Pareto), the second is where the individuals dominated by one individual are, the thirt front dominated by two and so on.
To promove diversity NMEEF-SD has a mechanism of reinitialization of the population based on coverage if the Pareto doesnt evolve during a 5
At the final of the evolutionary process, the algorithm returns only the individuals in the Pareto front which has a confidence greater than a minimum confidence level.
Parameters file structure
The paramFile
argument points to a file which has the necesary parameters for NMEEF-SD works.
This file must be, at least, those parameters (separated by a carriage return):
-
algorithm
Specify the algorithm to execute. In this case. "NMEEFSD" -
inputData
Specify two paths of KEEL files for training and test. In case of specify only the name of the file, the path will be the working directory. -
seed
Sets the seed for the random number generator -
nLabels
Sets the number of fuzzy labels to create when reading the files -
nEval
Set the maximun number of evaluations of rules for stop the genetic process -
popLength
Sets number of individuals of the main population -
ReInitCob
Sets if NMEEF-SD do the reinitialization based on coverage. Values: "yes" or "no" -
crossProb
Crossover probability of the genetic algorithm. Value in [0,1] -
mutProb
Mutation probability of the genetic algorithm. Value in [0,1] -
RulesRep
Representation of each chromosome of the population. "can" for canonical representation. "dnf" for DNF representation. -
porcCob
Sets the maximum percentage of variables participe in a rule when doing the reinitialization based on coverage. Value in [0,1] -
Obj1
Sets the objective number 1. -
Obj2
Sets the objective number 2. -
Obj3
Sets the objective number 3. -
minCnf
Minimum confidence for returning a rule of the Pareto. Value in [0,1] -
StrictDominance
Sets if the comparison of individuals when calculating dominance must be using strict dominance or not. Values: "yes" or "no" -
targetClass
Value of the target variable to search for subgroups. The target variable is always the last variable.. Usenull
to search for every value of the target variable
An example of parameter file could be:
algorithm = NMEEFSD inputData = "irisd-10-1tra.dat" "irisd-10-1tra.dat" "irisD-10-1tst.dat" outputData = "irisD-10-1-INFO.txt" "irisD-10-1-Rules.txt" "irisD-10-1-TestMeasures.txt" seed = 1 RulesRep = can nLabels = 3 nEval = 500 popLength = 51 crossProb = 0.6 mutProb = 0.1 ReInitCob = yes porcCob = 0.5 Obj1 = comp Obj2 = unus Obj3 = null minCnf = 0.6 StrictDominance = yes targetClass = Iris-setosa
Objective values
You can use the following quality measures in the ObjX value of the parameter file using this values:
Unusualness ->
unus
Crisp Support ->
csup
Crisp Confidence ->
ccnf
Fuzzy Support ->
fsup
Fuzzy Confidence ->
fcnf
Coverage ->
cove
Significance ->
sign
If you dont want to use a objetive value you must specify null
References
Carmona, C., Gonzalez, P., del Jesus, M., & Herrera, F. (2010). NMEEF-SD: Non-dominated Multi-objective Evolutionary algorithm for Extracting Fuzzy rules in Subgroup Discovery.
Examples
NMEEF_SD(paramFile = NULL,
training = habermanTra,
test = habermanTst,
output = c(NA, NA, NA),
seed = 0,
nLabels = 3,
nEval = 300,
popLength = 100,
mutProb = 0.1,
crossProb = 0.6,
Obj1 = "CSUP",
Obj2 = "CCNF",
Obj3 = "null",
minCnf = 0.6,
reInitCoverage = "yes",
porcCob = 0.5,
StrictDominance = "yes",
targetClass = "positive"
)
## Not run:
NMEEF_SD(paramFile = NULL,
training = habermanTra,
test = habermanTst,
output = c("optionsFile.txt", "rulesFile.txt", "testQM.txt"),
seed = 0,
nLabels = 3,
nEval = 300,
popLength = 100,
mutProb = 0.1,
crossProb = 0.6,
Obj1 = "CSUP",
Obj2 = "CCNF",
Obj3 = "null",
minCnf = 0.6,
reInitCoverage = "yes",
porcCob = 0.5,
StrictDominance = "yes",
targetClass = "null"
)
## End(Not run)