cluster_strings {clustringr}R Documentation

Cluster Strings by Edit-Distance

Description

Cluster Strings by Edit-Distance

Usage

cluster_strings(s_vec, clean = T, method = "osa", max_dist = 3,
  algo = "cc")

Arguments

s_vec

a vector of character strings

clean

whether to space-squish and de-duplicate s_vec

method

one of "osa","lv","dl" (as in 'stringdist')

max_dist

max distance (typically damerau-levenshtein) between related strings.

algo

one of "cc" (connected components) or "eb" (edge betweeness)

Value

a data frame containing cluster membership for each input string

Examples

s_vec <- c("alcool","alcohol","alcoholic","brandy","brandie","cachaça")
s_clust <- cluster_strings(s_vec,method="lv",max_dist=3,algo="cc")
s_clust$df_clusters

[Package clustringr version 1.0 Index]