ssm.compositionality {cultevo} | R Documentation |
Find a segmentation that maximises the overall string coverage across all signals.
Description
This algorithm builds on Spike's measure of compositionality (see
sm.compositionality
), except instead of simply determining
which segment(s) have the highest mutual predictability for each
meaning feature separately, it attempts to find a combination of
non-overlapping segments for each feature that maximises the overall string
coverage over all signals. In other words, it tries to find a segmentation
which can account for (or 'explain') as much of the string material in the
signals as possible.
Usage
ssm.compositionality(x, y, groups = NULL)
ssm.segmentation(x, y, mergefeatures = FALSE, verbose = FALSE)
Arguments
x |
a list or vector of character sequences |
y |
a matrix or data frame with as many rows as there are strings (see section Meaning data format) |
groups |
a list or vector with as many items as strings, used to split the signals and meanings into data sets for which the compositionality measures are computed separately. |
mergefeatures |
logical: if |
verbose |
logical: if |
Details
For large data sets and long strings, this computation can get very slow. If the attested signals are such that no perfect segmentation is possible, this algorithm is not guaranteed to find any segmentation (as no such segmentation might exist).
See Also
Examples
ssm.segmentation(c("as", "bas", "basf"),
cbind(a=c(TRUE, FALSE, TRUE), b=c(FALSE, TRUE, TRUE)))
# signaling system where one meaning distinction is not encoded in the signals
print(threebytwoanimals <- enumerate.meaningcombinations(list(animal=c("dog", "cat", "tiger"),
colour=c("col1", "col2"))))
ssm.segmentation(c("greendog", "bluedog", "greenfeline", "bluefeline", "greenfeline", "bluefeline"),
threebytwoanimals)
# the same analysis again, but allow merging of features
ssm.segmentation(c("greendog", "bluedog", "greenfeline", "bluefeline", "greenfeline", "bluefeline"),
threebytwoanimals, mergefeatures=TRUE)