SDTaskConfig-class {rsubgroup}R Documentation

Class “SDTaskConfig” — A Set of Configuration Settings

Description

A Set of Configuration Settings for the Subgroup and Pattern Mining Algorithms

Objects from the Class

Objects are created by calls of the form SDTaskConfig(...).

Slots

attributes:

The list of attributes to consider for mining. Either a vector of attribute names, or NULL (the default), which includes all attributes.

discretize:

Boolean, indicating whether to (automatically) discretize numeric attributes (default discretize=TRUE. Depends on parameter nbins. Either creates distinct values, if their number in the dataset is <= nbins, or applies equal-frequency discretization for the respective numeric attribute.

method:

A mining method; one of Beam-Search beam, BSD bsd, SD-Map sdmap, SD-Map enabling internal disjunctions sdmap-dis. The default is method = "sdmap".

nbins:

Specifies the number of bins to be used when discretizing numeric attributes (see discretize above).

qf:

A quality function; one of: Adjusted Residuals ares, Binomial Test bin, Chi-Square Test chi2, Gain gain, Lift lift, Piatetsky-Shapiro ps, Relative Gain relgain, Weighted Relative Accuracy wracc. The default is qf = "ps".

k:

The maximum number (top-k) of patterns to discover, i.e., the best k rules according to the selected quality function. The default is k = 20

minqual:

The minimal quality (default minqual = 0).

minsize:

The minimal size of a subgroup (as an integer) (minimal coverage of database records, default minsize = 0).

mintp:

The minimal true positive (tp) threshold, an integer (minimal (absolute) number of true positives in a subgroup, relevant for binary target concepts only), defaults to mintp = 0

.

maxlen:

The maximal length of a description of a pattern, i.e., the maximal number of conjunctions. This impacts both understandability and efficiency. Simpler rules are easier to understand, and a small maxlen will restrict the search space (default maxlen = 7).

nodefaults:

Ignore default values, i.e., do not include the respective first value (with index 0) of each attribute (default nodefaults=FALSE, i.e., include all values).

relfilter:

Controls, whether irrelevant patterns are filtered during pattern mining; negatively impacts performance (default relfilter = FALSE)).

postfilter:

Controls, whether a post-processing filter is applied; one (or a vector) of: Minimum Improvement (Global) min-improve-global, checks the patterns against all possible generalizations, Minimum Improvement (Pattern Set) min-improve-set, checks the patterns against all their generalizations in the result set, Relevancy Filter relevancy, removes patterns that are strictly irrelevant, Significant Improvement (Global) sig-improve-global, removes patterns that do not significantly improve (default 0.01 level) w.r.t. all their possible generalizations, Significant Improvement (Set) sig-improve-set, removes patterns that do not significantly improve (default 0.01 level) w.r.t. all generalizations in the result set, Weighted Covering weighted-covering, performs weighted covering on the data in order to select a covering set of subgroups while reducing the overlap on the data. By default no postfilter is set, i.e., postfilter = "".

parfilter:

Provides the minimal improvement value for the postfilter (for min-improve-* filters), or the significance level (P) for sig-improve-* filters.

See Also

DiscoverSubgroups. DiscoverSubgroupsByTask CreateSDTask


[Package rsubgroup version 1.1 Index]