prune {arc} | R Documentation |
Classifier Builder
Description
An implementation of the CBA-CB M1 algorithm (Liu et al, 1998) adapted for R and arules package apriori implementation in place of CBA-RG.
Usage
prune(
rules,
txns,
classitems,
default_rule_pruning = TRUE,
rule_window = 50000,
greedy_pruning = FALSE,
input_list_sorted_by_length = TRUE,
debug = FALSE
)
Arguments
rules |
object of class rules from arules package |
txns |
input object with transactions. |
classitems |
a list of items to appear in the consequent (rhs) of the rules. |
default_rule_pruning |
boolean indicating whether default pruning should be performed. If set to TRUE, default pruning is performed as in the CBA algorithm. If set to FALSE, default pruning is not performed i.e. all rules surviving data coverage pruning are kept. In either case, a default rule is added to the end of the classifier. |
rule_window |
the number of rules to precompute for CBA data coverage pruning. The default value can be adjusted to decrease runtime. |
greedy_pruning |
setting to TRUE activates early stopping condition: pruning will be stopped on first rule on which total error increases. |
input_list_sorted_by_length |
indicates by default that the input rule list is sorted by antecedent length (as output by arules), if this param is set to false, the list will be resorted |
debug |
output debug messages. |
Value
Returns an object of class rules. Note that 'rules@quality' slot has been extended
with additional measures, specifically 'orderedConf', 'orderedSupp', and 'cumulativeConf'. The rules are output in the order
in which they are assumed to be applied in classification. Only the first applicable rule is used to
classify the instance. As a result, in addition to rule confidence – which is computed over the
whole training dataset – it makes sense to define order-sensitive confidence, which is computed
only from instances reaching the given rule as a/(a+b)
, where a
is the number of instances
matching both the antecedent and consequent (available in slot 'orderedSupp') and b
is the number of instances matching the antecedent, but
not matching the consequent of the given rule. The cumulative confidence is an experimental measure,
which is computed as the accuracy of the rule list comprising the given rule and all higher priority
rules (rules with lower index) with uncovered instances excluded from the computation.
References
Ma, Bing Liu Wynne Hsu Yiming. Integrating classification and association rule mining. Proceedings of the fourth international conference on knowledge discovery and data mining. 1998.
See Also
Examples
#Example 1
txns <- as(discrNumeric(datasets::iris, "Species")$Disc.data,"transactions")
appearance <- getAppearance(datasets::iris,"Species")
rules <- apriori(txns, parameter = list(confidence = 0.5,
support= 0.01, minlen= 2, maxlen= 4),appearance = appearance)
prune(rules,txns, appearance$rhs)
inspect(rules)
#Example 2
utils::data(Adult) # this dataset comes with the arules package
classitems <- c("income=small","income=large")
rules <- apriori(Adult, parameter = list(supp = 0.3, conf = 0.5,
target = "rules"), appearance=list(rhs=classitems, default="lhs"))
# produces 25 rules
rulesP <- prune(rules,Adult,classitems)
rulesP@quality # inspect rule quality measured including the new additions
# Rules after data coverage pruning: 8
# Performing default rule pruning.
# Final rule list size: 6