opus {opusminer}R Documentation

Filtered Top-k Association Discovery of Self-Sufficient Itemsets

Description

opus finds the top k productive, non-redundant itemsets on the measure of interest (leverage or lift) using the OPUS Miner algorithm.

Usage

opus(transactions, k = 100, format = "data.frame", sep = " ",
  print_closures = FALSE, filter_itemsets = TRUE, search_by_lift = FALSE,
  correct_for_mult_compare = TRUE, redundancy_tests = TRUE)

Arguments

transactions

A filename, list, or object of class transactions (arules).

k

The number of itemsets to return, an integer (default 100).

format

The output format ("data.frame", default, or "itemsets").

sep

The separator between items (for files, default " ").

print_closures

return the closure for each itemset (default FALSE)

filter_itemsets

filter itemsets that are not independently productive (default TRUE)

search_by_lift

make lift (rather than leverage) the measure of interest (default FALSE)

correct_for_mult_compare

correct alpha for the size of the search space (default TRUE)

redundancy_tests

exclude redundant itemsets (default TRUE)

Details

opus provides an interface to the OPUS Miner algorithm (implemented in C++) to find the top k productive, non-redundant itemsets by leverage (default) or lift.

transactions should be a filename, list (of transactions, each list element being a vector of character values representing item labels), or an object of class transactions (arules).

Files should be in the format of a list of transactions, one line per transaction, each transaction (ie, line) being a sequence of item labels, separated by the character specified by the parameter sep (default " "). See, for example, the files at http://fimi.ua.ac.be/data/. (Alternatively, files can be read seaparately using the read_transactions function.)

format should be specified as either "data.frame" (the default) or "itemsets", and any other value will return a list.

Value

The top k productive, non-redundant itemsets, with relevant statistics, in the form of a data frame, object of class itemsets (arules), or a list.

References

Webb, G. I., & Vreeken, J. (2014). Efficient Discovery of the Most Interesting Associations. ACM Transactions on Knowledge Discovery from Data, 8(3), 1-15. doi: http://dx.doi.org/10.1145/2601433

Examples

## Not run: 

result <- opus("mushroom.dat")
result <- opus("mushroom.dat", k = 50)

result[result$self_sufficient, ]
result[order(result$count, decreasing = TRUE), ]

trans <- read_transactions("mushroom.dat", format = "transactions")

result <- opus(trans, print_closures = TRUE)
result <- opus(trans, format = "itemsets")

## End(Not run)

[Package opusminer version 0.1-1 Index]