transactions-class {arules} | R Documentation |
Class transactions — Binary Incidence Matrix for Transactions
Description
The transactions
class is a subclass of itemMatrix and
represents transaction data used for mining associations.
Usage
transactions(
x,
itemLabels = NULL,
transactionInfo = NULL,
format = "wide",
cols = NULL
)
## S4 method for signature 'transactions'
summary(object)
## S4 method for signature 'transactions'
toLongFormat(from, cols = c("TID", "item"), decode = TRUE)
## S4 method for signature 'transactions'
items(x)
transactionInfo(x)
## S4 method for signature 'transactions'
transactionInfo(x)
transactionInfo(x) <- value
## S4 replacement method for signature 'transactions'
transactionInfo(x) <- value
## S4 method for signature 'transactions'
dimnames(x)
## S4 replacement method for signature 'transactions,list'
dimnames(x) <- value
Arguments
x , object , from |
the object |
itemLabels |
a vector with labels for the items |
transactionInfo |
a transaction information data.frame with one row per transaction. |
format |
|
cols |
a numeric or character vector of length two giving the index or names of the columns (fields) with the transaction and item ids in the long format. |
decode |
translate item IDs to item labels? |
value |
replacement value |
Details
Transactions store the presence of items in each individual transaction
as binary matrix where rows represent the transactions and columns represent the items.
transactions
direct extends class itemMatrix
to store the sparse binary incidence matrix, item labels, and optionally transaction
IDs and user IDs. If you work with several transaction sets at the
same time, then the encoding (order of the items in the binary matrix) in
the different sets is important. See itemCoding to learn how
to encode and recode transaction sets.
Data Preparation
Data typically starts as a data.frame or a matrix and needs to be
prepared before it can be converted into transactions
(see coercion methods in
the Methods Section and the Example Section below for details on the needed
format).
Columns need to represent items which is different depending on the data type of the column:
-
Continuous variables: Continuous variables cannot directly be represented as items and need to be discretized first. An item resulting from discretization might be
age>18
and the column contains onlyTRUE
orFALSE
. Alternatively, it can be a factor with levelsage<=18
,50=>age>18
andage>50
. These will be automatically converted into 3 items, one for each level. Discretization is described in functionsdiscretize()
anddiscretizeDF()
. -
Logical variables: A logical variable describing a person could be
tall
indicating if the person is tall using the valuesTRUE
andFALSE
. The fact that the person is tall would be encoded in the transaction containing the itemtall
while not tall persons would not have this item. Therefore, for logical variables, theTRUE
value is converted into an item with the name of the variable and for theFALSE
values no item is created. -
Factors: Columns with nominal values (i.e., factor, ordered) are translated into a series of binary items (one for each level constructed as
variable name = level
). Items cannot represent order and this ordered factors lose the order information. Note that nominal variables need to be encoded as factors (and not characters or numbers). This can be done withdata[,"a_nominal_var"] <- factor(data[,"a_nominal_var"])
.Complete examples for how to prepare data can be found in the man pages for Income and Adult.
Functions
-
summary(transactions)
: produce a summary -
toLongFormat(transactions)
: convert the transactions to long format (a data.frame with two columns, tid and item). Column names can be specified as a character vector of length 2 calledcols
. -
items(transactions)
: get the transactions as an itemMatrix -
transactionInfo(transactions)
: get the transaction info data.frame -
transactionInfo(transactions) <- value
: replace the transaction info data.frame -
dimnames(transactions)
: get the dimnames -
dimnames(x = transactions) <- value
: set the dimnames
Slots
Slots are inherited from itemMatrix.
Objects from the Class
Objects are created by:
coercion from objects of other classes.
itemLabels
andtransactionInfo
are by default created from information inx
(e.g., from row and column names).the constructor function
transactions()
by calling
new("transactions", ...)
.
See Examples Section for creating transactions from data.
Coercions
-
as("transactions", "matrix")
-
as("matrix", "transactions")
-
as("list", "transactions")
-
as("transactions", "list")
-
as("data.frame", "transactions")
-
as("transactions", "data.frame")
-
as("ngCMatrix", "transactions")
Author(s)
Michael Hahsler
See Also
Superclass: itemMatrix
Other itemMatrix and transactions functions:
abbreviate()
,
crossTable()
,
c()
,
duplicated()
,
extract
,
hierarchy
,
image()
,
inspect()
,
is.superset()
,
itemFrequencyPlot()
,
itemFrequency()
,
itemMatrix-class
,
match()
,
merge()
,
random.transactions()
,
sample()
,
sets
,
size()
,
supportingTransactions()
,
tidLists-class
,
unique()
Examples
## Example 1: creating transactions form a list (each element is a transaction)
a_list <- list(
c("a","b","c"),
c("a","b"),
c("a","b","d"),
c("c","e"),
c("a","b","d","e")
)
## Set transaction names
names(a_list) <- paste("Tr", c(1:5), sep = "")
a_list
## Use the constructor to create transactions
## Note: S4 coercion does the same trans1 <- as(a_list, "transactions")
trans1 <- transactions(a_list)
trans1
## Analyze the transactions
summary(trans1)
image(trans1)
## Example 2: creating transactions from a 0-1 matrix with 5 transactions (rows) and
## 5 items (columns)
a_matrix <- matrix(
c(1, 1, 1, 0, 0,
1, 1, 0, 0, 0,
1, 1, 0, 1, 0,
0, 0, 1, 0, 1,
1, 1, 0, 1, 1), ncol = 5)
## Set item names (columns) and transaction labels (rows)
colnames(a_matrix) <- c("a", "b", "c", "d", "e")
rownames(a_matrix) <- paste("Tr", c(1:5), sep = "")
a_matrix
## Create transactions
trans2 <- transactions(a_matrix)
trans2
inspect(trans2)
## Example 3: creating transactions from data.frame (wide format)
a_df <- data.frame(
age = as.factor(c( 6, 8, NA, 9, 16)),
grade = as.factor(c("A", "C", "F", NA, "C")),
pass = c(TRUE, TRUE, FALSE, TRUE, TRUE))
## Note: factors are translated differently than logicals and NAs are ignored
a_df
## Create transactions
trans3 <- transactions(a_df)
inspect(trans3)
## Note that coercing the transactions back to a data.frame does not recreate the
## original data.frame, but represents the transactions as sets of items
as(trans3, "data.frame")
## Example 4: creating transactions from a data.frame with
## transaction IDs and items (long format)
a_df3 <- data.frame(
TID = c( 1, 1, 2, 2, 2, 3 ),
item = c("a", "b", "a", "b", "c", "b")
)
a_df3
trans4 <- transactions(a_df3, format = "long", cols = c("TID", "item"))
trans4
inspect(trans4)
## convert transactions back into long format.
toLongFormat(trans4)
## Example 5: create transactions from a dataset with numeric variables
## using discretization.
data(iris)
irisDisc <- discretizeDF(iris)
head(irisDisc)
trans5 <- transactions(irisDisc)
trans5
inspect(head(trans5))
## Note, creating transactions without discretizing numeric variables will apply the
## default discretization and also create a warning.
## Example 6: create transactions manually (with the same item coding as in trans5)
trans6 <- transactions(
list(
c("Sepal.Length=[4.3,5.4)", "Species=setosa"),
c("Sepal.Length=[4.3,5.4)", "Species=setosa")
), itemLabels = trans5)
trans6
inspect(trans6)