cluster_strings {clustringr} | R Documentation |
Cluster Strings by Edit-Distance
Description
Cluster Strings by Edit-Distance
Usage
cluster_strings(s_vec, clean = T, method = "osa", max_dist = 3,
algo = "cc")
Arguments
s_vec |
a vector of character strings |
clean |
whether to space-squish and de-duplicate s_vec |
method |
one of "osa","lv","dl" (as in 'stringdist') |
max_dist |
max distance (typically damerau-levenshtein) between related strings. |
algo |
one of "cc" (connected components) or "eb" (edge betweeness) |
Value
a data frame containing cluster membership for each input string
Examples
s_vec <- c("alcool","alcohol","alcoholic","brandy","brandie","cachaça")
s_clust <- cluster_strings(s_vec,method="lv",max_dist=3,algo="cc")
s_clust$df_clusters
[Package clustringr version 1.0 Index]