R: Support Counting for Itemsets

support {arules}

R Documentation

Support Counting for Itemsets

Description

Provides the generic function support() and the methods to count support for given itemMatrix and associations in a given transactions data.

Usage

support(x, transactions, ...)

## S4 method for signature 'itemMatrix'
support(
  x,
  transactions,
  type = c("relative", "absolute"),
  method = c("ptree", "tidlists"),
  reduce = FALSE,
  weighted = FALSE,
  verbose = FALSE,
  ...
)

## S4 method for signature 'associations'
support(
  x,
  transactions,
  type = c("relative", "absolute"),
  method = c("ptree", "tidlists"),
  reduce = FALSE,
  weighted = FALSE,
  verbose = FALSE,
  ...
)

Arguments

`x`	the set of itemsets for which support should be counted.
`transactions`	the transaction data set used for mining.
`...`	further arguments.
`type`	a character string specifying if `"relative"` support or `"absolute"` support (counts) are returned for the itemsets in `x`. (default: `"relative"`)
`method`	use `"ptree"` or `"tidlists"`. See Details Section.
`reduce`	should unused items are removed before counting?
`weighted`	should support be weighted by transactions weights stored as column `"weight"` in `transactionInfo`?
`verbose`	report progress?

Details

Normally, the support of frequent itemsets is very efficiently counted during mining process using a set minimum support. However, if only the support for specific itemsets (maybe itemsets with very low support) is needed, or the support of a set of itemsets needs to be recalculated on different transactions than they were mined on, then support() can be used.

Several methods for support counting are available:

"ptree" (default method): The counters for the itemsets are organized in a prefix tree. The transactions are sequentially processed and the corresponding counters in the prefix tree are incremented (see Hahsler et al, 2008). This method is used by default since it is typically significantly faster than transaction ID list intersection.
"tidlists": support is counted using transaction ID list intersection which is used by several fast mining algorithms (e.g., by Eclat). However, Support is determined for each itemset individually which is slow for a large number of long itemsets in dense data.

To speed up counting, reduce = TRUE can be specified in control. Unused items are removed from the transactions before counting.

Value

A numeric vector of the same length as x containing the support values for the sets in x.

Author(s)

Michael Hahsler and Christian Buchta

References

Michael Hahsler, Christian Buchta, and Kurt Hornik. Selective association rule generation. Computational Statistics, 23(2):303-315, April 2008.

Examples

data("Income")

## find and some frequent itemsets
itemsets <- eclat(Income)[1:5]

## inspect the support returned by eclat
inspect(itemsets)

## count support in the database
support(items(itemsets), Income)

[Package arules version 1.7-7 Index]