ClusterStrings {TreeSearch} | R Documentation |
Cluster similar strings
Description
Calculate string similarity using the Levenshtein distance and return clusters of similar strings.
Usage
ClusterStrings(x, maxCluster = 12)
Arguments
x |
Character vector. |
maxCluster |
Integer specifying maximum number of clusters to consider. |
Value
NameClusters()
returns an integer assigning each element of x
to a cluster, with an attribute med
specifying the median string in each
cluster, and silhouette
reporting the silhouette coefficient of the optimal
clustering. Coefficients < 0.5 indicate weak structure, and no clusters are
returned. If the number of unique elements of x
is less than maxCluster
,
all occurrences of each entry are assigned to an individual cluster.
Author(s)
Martin R. Smith (martin.smith@durham.ac.uk)
See Also
Other utility functions:
QuartetResolution()
,
WhenFirstHit()
Examples
ClusterStrings(c(paste0("FirstCluster ", 1:5),
paste0("SecondCluster.", 8:12),
paste0("AnotherCluster_", letters[1:6])))
[Package TreeSearch version 1.5.1 Index]