itemCoding {arules}R Documentation

Item Coding — Conversion between Item Labels and Column IDs

Description

The order in which items are stored in an itemMatrix is called the item coding. The following generic functions and methods are used to translate between the representation in the itemMatrix format (used in transactions, rules and itemsets), item labels and numeric item IDs (i.e., the column numbers in the itemMatrix representation).

Usage

decode(x, ...)

## S4 method for signature 'numeric'
decode(x, itemLabels)

## S4 method for signature 'list'
decode(x, itemLabels)

encode(x, ...)

## S4 method for signature 'character'
encode(x, itemLabels, itemMatrix = TRUE)

## S4 method for signature 'numeric'
encode(x, itemLabels, itemMatrix = TRUE)

## S4 method for signature 'list'
encode(x, itemLabels, itemMatrix = TRUE)

recode(x, ...)

## S4 method for signature 'itemMatrix'
recode(x, itemLabels = NULL, match = NULL)

## S4 method for signature 'itemsets'
recode(x, itemLabels = NULL, match = NULL)

## S4 method for signature 'rules'
recode(x, itemLabels = NULL, match = NULL)

compatible(x, y)

## S4 method for signature 'itemMatrix'
compatible(x, y)

## S4 method for signature 'associations'
compatible(x, y)

Arguments

x

a vector or a list of vectors of character strings (for encode() or of numeric (for decode()), or an object of class itemMatrix (for recode()).

...

further arguments.

itemLabels

a vector of character strings used for coding where the position of an item label in the vector gives the item's column ID. Alternatively, a itemMatrix, transactions or associations object can be specified and the item labels or these objects are used.

itemMatrix

return an object of class itemMatrix otherwise an object of the same class as x is returned.

match

deprecated: used itemLabels instead.

y

an object of class itemMatrix, transactions or associations to compare item coding to x.

Details

Item coding compatibility: When working with several datasets or different subsets of the same dataset, combining or compare the found itemsets or rules requires a compatible item coding. That is, the sparse matrices representing the items (the itemMatrix objects) have columns for the same items in exactly the same order. The coercion to transactions with transactions() or as(x, "transactions") will create the item coding by adding items in the order they are encountered in the dataset. This can lead to different item codings (different order, missing items) for even only slightly different datasets or versions of a dataset. Method compatible() can be used to check if two sets have the same item coding.

Defining a common item coding: When working with many sets, then first a common item coding should be defined by creating a vector with all possible item labels and then specify them as itemLabels to create transactions with transactions(). Compatible itemMatrix objects can be created using encode().

Recoding and Decoding: Two incompatible objects can be made compatible using recode(). Recode one object by specifying the other object in itemLabels.

decode() converts from the column IDs used in the itemMatrix representation to item labels. decode() is used by LIST().

Value

recode() always returns an object of the same class as x.

For encode() with itemMatrix = TRUE an object of class itemMatrix is returned. Otherwise the result is of the same type as x, e.g., a list or a vector.

Author(s)

Michael Hahsler

See Also

LIST(), associations, itemMatrix

Other preprocessing: discretize(), hierarchy, merge(), sample()

Examples

data("Adult")

## Example 1: Manual decoding
## Extract the item coding as a vector of item labels.
iLabels <- itemLabels(Adult)
head(iLabels)

## get undecoded list (itemIDs)
list <- LIST(Adult[1:5], decode = FALSE)
list

## decode itemIDs by replacing them with the appropriate item label
decode(list, itemLabels = iLabels)


## Example 2: Manually create an itemMatrix using iLabels as the common item coding
data <- list(
    c("income=small", "age=Young"),
    c("income=large", "age=Middle-aged")
    )

# Option a: encode to match the item coding in Adult
iM <- encode(data, itemLabels = Adult)
iM
inspect(iM)
compatible(iM, Adult)

# Option b: coercion plus recode to make it compatible to Adult
#           (note: the coding has 115 item columns after recode)
iM <- as(data, "itemMatrix")
iM
compatible(iM, Adult)

iM <- recode(iM, itemLabels = Adult)
iM
compatible(iM, Adult)


## Example 3: use recode to make itemMatrices compatible
## select first 100 transactions and all education-related items
sub <- Adult[1:100, itemInfo(Adult)$variables ==  "education"]
itemLabels(sub)
image(sub)

## After choosing only a subset of items (columns), the item coding is now
## no longer compatible with the Adult dataset
compatible(sub, Adult)

## recode to match Adult again
sub.recoded <- recode(sub, itemLabels = Adult)
image(sub.recoded)


## Example 4: manually create 2 new transaction for the Adult data set
##            Note: check itemLabels(Adult) to see the available labels for items
twoTransactions <- as(
    encode(list(
        c("age=Young", "relationship=Unmarried"),
        c("age=Senior")
      ), itemLabels = Adult),
    "transactions")

twoTransactions
inspect(twoTransactions)

## the same using the transactions constructor function instead
twoTransactions <- transactions(
    list(
        c("age=Young", "relationship=Unmarried"),
        c("age=Senior")
    ), itemLabels = Adult)

twoTransactions
inspect(twoTransactions)

## Example 5: Use a common item coding

# Creation of transactions separately will produce different item codings
trans1 <- transactions(
    list(
        c("age=Young", "relationship=Unmarried"),
        c("age=Senior")
    ))
trans1

trans2 <- transactions(
    list(
        c("age=Middle-aged", "relationship=Married"),
        c("relationship=Unmarried", "age=Young")
    ))
trans2

compatible(trans1, trans2)

# produce common item coding (all item labels in the two sets)
commonItemLabels <- union(itemLabels(trans1), itemLabels(trans2))
commonItemLabels

trans1 <- recode(trans1, itemLabels = commonItemLabels)
trans1
trans2 <- recode(trans2, itemLabels = commonItemLabels)
trans2

compatible(trans1, trans2)


## Example 6: manually create a rule using the item coding in Adult
## and calculate interest measures
aRule <- new("rules",
  lhs = encode(list(c("age=Young", "relationship=Unmarried")),
    itemLabels = Adult),
  rhs = encode(list(c("income=small")),
    itemLabels = Adult)
)

## shorter version using the rules constructor
aRule <- rules(
  lhs = list(c("age=Young", "relationship=Unmarried")),
  rhs = list(c("income=small")),
  itemLabels = Adult
)

quality(aRule) <- interestMeasure(aRule,
  measure = c("support", "confidence", "lift"), transactions = Adult)

inspect(aRule)

[Package arules version 1.7-7 Index]