transactions-class {arules}R Documentation

Class transactions — Binary Incidence Matrix for Transactions

Description

The transactions class represents transaction data used for mining itemsets or rules.

Details

Transactions are a direct extension of class itemMatrix to store a binary incidence matrix, item labels, and optionally transaction IDs and user IDs.

Transactions can be created from a list containing transactions or a matrix or data.frames using

itemLabels and transactionInfo are by default created from information in x (e.g., from row and column names). In the constructor function, the user can specify for itemLabels a vector of all possible item labels (character) or another transactions object to copy the item coding (see itemCoding for details).

Note that you will need to prepare your data first (see coercion methods in the Methods Section and the Example Section below for details on the needed format).

Continuous variables: Association rule mining can only use items and does not work with continuous variables. Continuous variables need to be discretized first. An item resulting from discretization might be age>18 and the column contains only TRUE or FALSE. Alternatively it can be a factor with levels age<=18, 50=>age>18 and age>50. These will be automatically converted into 3 items, one for each level. Have a look at the function discretize for automatic discretization.

Logical variables: A logical variable describing a person could be tall indicating if the person is tall using the values TRUE and FALSE. The fact that the person is tall would be encoded in the transaction containing the item tall while not tall persons would not have this item. Therefore, for logical variables, the TRUE value is converted into an item with the name of the variable and for the FALSE values no item is created.

Factors: The function also can convert columns with nominal values (i.e., factors) into a series of binary items (one for each level constructed as `variable name`=`level`). Note that nominal variables need to be encoded as factors (and not characters or numbers). This can be done with

data[,"a_nominal_var"] <- factor(data[,"a_nominal_var"]).

Complete examples for how to prepare data can be found in the man pages for Income and Adult.

Transactions are represented as sparse binary matrices of class itemMatrix. If you work with several transaction sets at the same time, then the encoding (order of the items in the binary matrix) in the different sets is important. See itemCoding to learn how to encode and recode transaction sets.

Objects from the Class

Objects are created by coercion from objects of other classes (see Examples section) or by calls of the form

new("transactions", ...)

or by using the constructor function

transactions(x, itemLabels = NULL, transactionInfo = NULL, format = "wide", cols = NULL).

Format "wide" is a regular data.frame where each row contains an object. Format "long" is a data.frame with one column with transaction IDs and one with an item. cols is a numeric or character vector of length two giving the numbers or names of the columns (fields) with the transaction and item ids, respectively.

See Examples Section for creating transactions from data.

Slots

itemsetInfo:

a data.frame with one row per transaction (each transaction is considered an itemset). The data.frame can hold columns with additional information, e.g., transaction IDs or user IDs for each transaction. Note: this slot is inherited from class itemMatrix, but should be accessed in transactions with the method transactionInfo().

data:

object of class ngCMatrix to store the binary incidence matrix (see itemMatrix class)

itemInfo:

a data.frame to store item labels (see itemMatrix class)

Extends

Class itemMatrix, directly.

Methods

coerce

signature(from = "matrix", to = "transactions"); produces a transactions data set from a binary incidence matrix. The column names are used as item labels and the row names are stores as transaction IDs.

coerce

signature(from = "transactions", to = "matrix"); coerces the transactions data set into a binary incidence matrix.

coerce

signature(from = "list", to = "transactions"); produces a transactions data set from a list. The names of the items in the list are used as item labels.

coerce

signature(from = "transactions", to = "list"); coerces the transactions data set into a list of transactions. Each transaction is a vector of character strings (names of the contained items).

coerce

signature(from = "data.frame", to = "transactions"); recodes the data frame containing only categorical variables (factors) or logicals all into a binary transaction data set. For binary variables only TRUE values are converted into items and the item label is the variable name. For factors, a dummy item for each level is automatically generated. Item labels are generated by concatenating variable names and levels with "=". The original variable names and levels are stored in the itemInfo data frame as the components variables and levels. Note that NAs are ignored (i.e., do not generate an item).

coerce

signature(from = "transactions", to = "data.frame"); represents the set of transactions in a printable form as a data.frame. Note that this does not reverse coercion from data.frame to transactions.

coerce

signature(from = "ngCMatrix", to = "transactions"); Note that the data is stored transposed in the ngCMatrix. Items are stored as rows and transactions are columns!

dimnames, rownames, colnames

signature(x = "transactions"); returns row (transactionID) and column (item) names.

items

signature(x = "transactions"); returns the items in the transactions as an itemMatrix.

labels

signature(x = "transactions"); returns the labels for the itemsets in each transaction (see itemMatrix).

transactionInfo<-

signature(x = "transactions"); replaces the transaction information with a new data.frame.

transactionInfo

signature(x = "transactions"); returns the transaction information as a data.frame.

toLongFormat

signature(object = "transactions"); convert the transactions to long format (a data.frame with two columns, tid and item). Column names can be specified as a character vector of length 2 called cols.

show

signature(object = "transactions")

summary

signature(object = "transactions")

Author(s)

Michael Hahsler

See Also

[-methods, discretize, LIST, write, c, image, inspect, itemCoding, read.transactions, random.transactions, sets, itemMatrix-class

Examples

## Example 1: creating transactions form a list (each element is a transaction)
a_list <- list(
      c("a","b","c"),
      c("a","b"),
      c("a","b","d"),
      c("c","e"),
      c("a","b","d","e")
      )

## Set transaction names
names(a_list) <- paste("Tr",c(1:5), sep = "")
a_list

## Use the constructor to create transactions
## Note: S4 coercion does the same trans1 <- as(a_list, "transactions")
trans1 <- transactions(a_list)
trans1

## Analyze the transactions
summary(trans1)
image(trans1)

## Example 2: creating transactions from a 0-1 matrix with 5 transactions (rows) and 
##            5 items (columns)
a_matrix <- matrix(c(
  1, 1, 1, 0, 0,
	1, 1, 0, 0, 0,
	1, 1, 0, 1, 0,
	0, 0, 1, 0, 1,
	1, 1, 0, 1, 1
  ), ncol = 5)

## Set item names (columns) and transaction labels (rows)
colnames(a_matrix) <- c("a", "b", "c", "d", "e")
rownames(a_matrix) <- paste("Tr", c(1:5), sep = "")

a_matrix

## Create transactions
trans2 <- transactions(a_matrix)
trans2
inspect(trans2)

## Example 3: creating transactions from data.frame (wide format)
a_df <- data.frame(
	age   = as.factor(c(6, 8, NA, 9, 16)), 
	grade = as.factor(c("A", "C", "F", NA, "C")),
  pass  = c(TRUE, TRUE, FALSE, TRUE, TRUE))  
## Note: factors are translated differently than logicals and NAs are ignored
a_df

## Create transactions
trans3 <- transactions(a_df) 
inspect(trans3)

## Note that coercing the transactions back to a data.frame does not recreate the 
## original data.frame, but represents the transactions as sets of items
as(trans3, "data.frame")

## Example 4: creating transactions from a data.frame with 
## transaction IDs and items (long format) 
a_df3 <- data.frame(
  TID =  c( 1,   1,   2,   2,   2,   3 ), 
  item = c("a", "b", "a", "b", "c", "b")
)
a_df3
trans4 <- transactions(a_df3, format = "long", cols = c("TID", "item"))
trans4
inspect(trans4)

## convert transactions back into long format.
toLongFormat(trans4)

## Example 5: create transactions from a dataset with numeric variables
## using discretization.
data(iris)

irisDisc <- discretizeDF(iris)
head(irisDisc)

trans5 <- transactions(irisDisc)
trans5
inspect(head(trans5))

## Note, creating transactions without discretizing numeric variables will apply the 
## default discretization and also create a warning.


## Example 6: create transactions manually (with the same item coding as in trans5)
trans6 <- transactions(
  list(
    c("Sepal.Length=[4.3,5.4)", "Species=setosa"),
    c("Sepal.Length=[4.3,5.4)", "Species=setosa")
  ), itemLabels = trans5)
trans6  

inspect(trans6)

[Package arules version 1.7-1 Index]