R: Class transactions - Binary Incidence Matrix for Transactions

transactions-class {arules}

R Documentation

Class transactions — Binary Incidence Matrix for Transactions

Description

The transactions class is a subclass of itemMatrix and represents transaction data used for mining associations.

Usage

transactions(
  x,
  itemLabels = NULL,
  transactionInfo = NULL,
  format = "wide",
  cols = NULL
)

## S4 method for signature 'transactions'
summary(object)

## S4 method for signature 'transactions'
toLongFormat(from, cols = c("TID", "item"), decode = TRUE)

## S4 method for signature 'transactions'
items(x)

transactionInfo(x)

## S4 method for signature 'transactions'
transactionInfo(x)

transactionInfo(x) <- value

## S4 replacement method for signature 'transactions'
transactionInfo(x) <- value

## S4 method for signature 'transactions'
dimnames(x)

## S4 replacement method for signature 'transactions,list'
dimnames(x) <- value

Arguments

`x`, `object`, `from`	the object
`itemLabels`	a vector with labels for the items
`transactionInfo`	a transaction information data.frame with one row per transaction.
`format`	`"wide"` or `"long"` format? Format wide is a regular data.frame where each row contains an object. Format "long" is a data.frame with one column with transaction IDs and one with an item (see `cols` below).
`cols`	a numeric or character vector of length two giving the index or names of the columns (fields) with the transaction and item ids in the long format.
`decode`	translate item IDs to item labels?
`value`	replacement value

Details

Transactions store the presence of items in each individual transaction as binary matrix where rows represent the transactions and columns represent the items. transactions direct extends class itemMatrix to store the sparse binary incidence matrix, item labels, and optionally transaction IDs and user IDs. If you work with several transaction sets at the same time, then the encoding (order of the items in the binary matrix) in the different sets is important. See itemCoding to learn how to encode and recode transaction sets.

Data Preparation

Data typically starts as a data.frame or a matrix and needs to be prepared before it can be converted into transactions (see coercion methods in the Methods Section and the Example Section below for details on the needed format).

Columns need to represent items which is different depending on the data type of the column:

Continuous variables: Continuous variables cannot directly be represented as items and need to be discretized first. An item resulting from discretization might be age>18 and the column contains only TRUE or FALSE. Alternatively, it can be a factor with levels age<=18, ⁠50=>age>18⁠ and age>50. These will be automatically converted into 3 items, one for each level. Discretization is described in functions discretize() and discretizeDF().
Logical variables: A logical variable describing a person could be tall indicating if the person is tall using the values TRUE and FALSE. The fact that the person is tall would be encoded in the transaction containing the item tall while not tall persons would not have this item. Therefore, for logical variables, the TRUE value is converted into an item with the name of the variable and for the FALSE values no item is created.
Factors: Columns with nominal values (i.e., factor, ordered) are translated into a series of binary items (one for each level constructed as ⁠variable name = level⁠). Items cannot represent order and this ordered factors lose the order information. Note that nominal variables need to be encoded as factors (and not characters or numbers). This can be done with

data[,"a_nominal_var"] <- factor(data[,"a_nominal_var"]).

Complete examples for how to prepare data can be found in the man pages for Income and Adult.

Functions

summary(transactions): produce a summary
toLongFormat(transactions): convert the transactions to long format (a data.frame with two columns, tid and item). Column names can be specified as a character vector of length 2 called cols.
items(transactions): get the transactions as an itemMatrix
transactionInfo(transactions): get the transaction info data.frame
transactionInfo(transactions) <- value: replace the transaction info data.frame
dimnames(transactions): get the dimnames
dimnames(x = transactions) <- value: set the dimnames

Slots

Slots are inherited from itemMatrix.

Objects from the Class

Objects are created by:

coercion from objects of other classes. itemLabels and transactionInfo are by default created from information in x (e.g., from row and column names).
the constructor function transactions()
by calling new("transactions", ...).

See Examples Section for creating transactions from data.

Coercions

as("transactions", "matrix")
as("matrix", "transactions")
as("list", "transactions")
as("transactions", "list")
as("data.frame", "transactions")
as("transactions", "data.frame")
as("ngCMatrix", "transactions")

Author(s)

Michael Hahsler

Examples

## Example 1: creating transactions form a list (each element is a transaction)
a_list <- list(
      c("a","b","c"),
      c("a","b"),
      c("a","b","d"),
      c("c","e"),
      c("a","b","d","e")
      )

## Set transaction names
names(a_list) <- paste("Tr", c(1:5), sep = "")
a_list

## Use the constructor to create transactions
## Note: S4 coercion does the same trans1 <- as(a_list, "transactions")
trans1 <- transactions(a_list)
trans1

## Analyze the transactions
summary(trans1)
image(trans1)

## Example 2: creating transactions from a 0-1 matrix with 5 transactions (rows) and
##            5 items (columns)
a_matrix <- matrix(
  c(1, 1, 1, 0, 0,
	   1, 1, 0, 0, 0,
	   1, 1, 0, 1, 0,
	   0, 0, 1, 0, 1,
	   1, 1, 0, 1, 1), ncol = 5)

## Set item names (columns) and transaction labels (rows)
colnames(a_matrix) <- c("a", "b", "c", "d", "e")
rownames(a_matrix) <- paste("Tr", c(1:5), sep = "")

a_matrix

## Create transactions
trans2 <- transactions(a_matrix)
trans2
inspect(trans2)

## Example 3: creating transactions from data.frame (wide format)
a_df <- data.frame(
	age   = as.factor(c( 6,   8,   NA, 9,   16)),
	grade = as.factor(c("A", "C", "F", NA, "C")),
  pass  = c(TRUE, TRUE, FALSE, TRUE, TRUE))
## Note: factors are translated differently than logicals and NAs are ignored
a_df

## Create transactions
trans3 <- transactions(a_df)
inspect(trans3)

## Note that coercing the transactions back to a data.frame does not recreate the
## original data.frame, but represents the transactions as sets of items
as(trans3, "data.frame")

## Example 4: creating transactions from a data.frame with
## transaction IDs and items (long format)
a_df3 <- data.frame(
  TID =  c( 1,   1,   2,   2,   2,   3 ),
  item = c("a", "b", "a", "b", "c", "b")
)
a_df3
trans4 <- transactions(a_df3, format = "long", cols = c("TID", "item"))
trans4
inspect(trans4)

## convert transactions back into long format.
toLongFormat(trans4)

## Example 5: create transactions from a dataset with numeric variables
## using discretization.
data(iris)

irisDisc <- discretizeDF(iris)
head(irisDisc)

trans5 <- transactions(irisDisc)
trans5
inspect(head(trans5))

## Note, creating transactions without discretizing numeric variables will apply the
## default discretization and also create a warning.


## Example 6: create transactions manually (with the same item coding as in trans5)
trans6 <- transactions(
  list(
    c("Sepal.Length=[4.3,5.4)", "Species=setosa"),
    c("Sepal.Length=[4.3,5.4)", "Species=setosa")
  ), itemLabels = trans5)
trans6

inspect(trans6)

[Package arules version 1.7-7 Index]