similarity-methods {arulesSequences} | R Documentation |

Provides the generic function `similarity`

and the S4 method
to compute similarities among a collection of sequences.

`is.subset, is.superset`

find subsequence or supersequence
relationships among a collection of sequences.

similarity(x, y = NULL, ...) ## S4 method for signature 'sequences' similarity(x, y = NULL, method = c("jaccard", "dice", "cosine", "subset"), strict = FALSE) ## S4 method for signature 'sequences' is.subset(x, y = NULL, proper = FALSE) ## S4 method for signature 'sequences' is.superset(x, y = NULL, proper = FALSE)

`x, y` |
an object. |

`...` |
further (unused) arguments. |

`method` |
a string specifying the similarity measure to use (see details). |

`strict` |
a logical value specifying if strict itemset matching should be used. |

`proper` |
a logical value specifying if only strict relationships (omitting equality) should be indicated. |

Let the number of *common* elements of two sequences refer to
those that occur in a longest common subsequence. The following
similarity measures are implemented:

`jaccard`

:The number of common elements divided by the total number of elements (the sum of the lengths of the sequences minus the length of the longest common subsequence).

`dice`

:Uses two times the number of common elements.

`cosine`

:Uses the square root of the product of the sequence lengths for the denominator.

`subset`

:Zero if the first sequence is not a subsequence of the second. Otherwise the number of common elements divided by the number of elements in the first sequence.

If `strict = TRUE`

the elements (itemsets) of the sequences must
be equal to be matched. Otherwise matches are quantified by the
similarity of the itemsets (as specified by `method`

) thresholded
at 0.5, and the common sequence by the sum of the similarities.

For `similarity`

, returns an object of class
`dsCMatrix`

if the result
is symmetric (or `method = "subset"`

) and and object of
class `dgCMatrix`

otherwise.

For `is.subset, is.superset`

returns an object of class
`lgCMatrix`

.

Computation of the longest common subsequence of two sequences of
length `n, m`

takes `O(n*m)`

time.

The supported set of operations for the above matrix classes depends
on package Matrix. In case of problems, expand to full storage
representation using `as(x, "matrix")`

or `as.matrix(x)`

.

For efficiency use `as(x, "dist")`

to convert a symmetric
result matrix for clustering.

Christian Buchta

Class
`sequences`

,
method
`dissimilarity`

.

## use example data data(zaki) z <- as(zaki, "timedsequences") similarity(z) # require equality similarity(z, strict = TRUE) ## emphasize common similarity(z, method = "dice") ## is.subset(z) is.subset(z, proper = TRUE)

[Package *arulesSequences* version 0.2-25 Index]