SDTaskConfig-class {rsubgroup} | R Documentation |
Class “SDTaskConfig” — A Set of Configuration Settings
Description
A Set of Configuration Settings for the Subgroup and Pattern Mining Algorithms
Objects from the Class
Objects are created by calls of the form
SDTaskConfig(...)
.
Slots
attributes
:The list of attributes to consider for mining. Either a vector of attribute names, or NULL (the default), which includes all attributes.
discretize
:Boolean, indicating whether to (automatically) discretize numeric attributes (default
discretize=TRUE
. Depends on parameter nbins. Either creates distinct values, if their number in the dataset is <= nbins, or applies equal-frequency discretization for the respective numeric attribute.method
:A mining method; one of Beam-Search
beam
, BSDbsd
, SD-Mapsdmap
, SD-Map enabling internal disjunctionssdmap-dis
. The default ismethod = "sdmap"
.nbins
:Specifies the number of bins to be used when discretizing numeric attributes (see
discretize
above).qf
:A quality function; one of: Adjusted Residuals
ares
, Binomial Testbin
, Chi-Square Testchi2
, Gaingain
, Liftlift
, Piatetsky-Shapirops
, Relative Gainrelgain
, Weighted Relative Accuracywracc
. The default isqf = "ps"
.k
:The maximum number (top-k) of patterns to discover, i.e., the best k rules according to the selected quality function. The default is
k = 20
minqual
:The minimal quality (default
minqual = 0
).minsize
:The minimal size of a subgroup (as an integer) (minimal coverage of database records, default
minsize = 0
).mintp
:The minimal true positive (tp) threshold, an integer (minimal (absolute) number of true positives in a subgroup, relevant for binary target concepts only), defaults to
mintp = 0
.
maxlen
:The maximal length of a description of a pattern, i.e., the maximal number of conjunctions. This impacts both understandability and efficiency. Simpler rules are easier to understand, and a small
maxlen
will restrict the search space (defaultmaxlen = 7
).nodefaults
:Ignore default values, i.e., do not include the respective first value (with index 0) of each attribute (default
nodefaults=FALSE
, i.e., include all values).relfilter
:Controls, whether irrelevant patterns are filtered during pattern mining; negatively impacts performance (default
relfilter = FALSE
)).postfilter
:Controls, whether a post-processing filter is applied; one (or a vector) of: Minimum Improvement (Global)
min-improve-global
, checks the patterns against all possible generalizations, Minimum Improvement (Pattern Set)min-improve-set
, checks the patterns against all their generalizations in the result set, Relevancy Filterrelevancy
, removes patterns that are strictly irrelevant, Significant Improvement (Global)sig-improve-global
, removes patterns that do not significantly improve (default 0.01 level) w.r.t. all their possible generalizations, Significant Improvement (Set)sig-improve-set
, removes patterns that do not significantly improve (default 0.01 level) w.r.t. all generalizations in the result set, Weighted Coveringweighted-covering
, performs weighted covering on the data in order to select a covering set of subgroups while reducing the overlap on the data. By default no postfilter is set, i.e.,postfilter = ""
.parfilter
:Provides the minimal improvement value for the postfilter (for min-improve-* filters), or the significance level (P) for sig-improve-* filters.
See Also
DiscoverSubgroups
.
DiscoverSubgroupsByTask
CreateSDTask