group_str {sjmisc} | R Documentation |
Group near elements of string vectors
Description
This function groups elements of a string vector (character or string variable) according to the element's distance ('similatiry'). The more similar two string elements are, the higher is the chance to be combined into a group.
Usage
group_str(
strings,
precision = 2,
strict = FALSE,
trim.whitespace = TRUE,
remove.empty = TRUE,
verbose = FALSE,
maxdist
)
Arguments
strings |
Character vector with string elements. |
precision |
Maximum distance ("precision") between two string elements, which is allowed to treat them as similar or equal. Smaller values mean less tolerance in matching. |
strict |
Logical; if |
trim.whitespace |
Logical; if |
remove.empty |
Logical; if |
verbose |
Logical; if |
maxdist |
Deprecated. Please use |
Value
A character vector where similar string elements (values) are recoded
into a new, single value. The return value is of same length as
strings
, i.e. grouped elements appear multiple times, so
the count for each grouped string is still avaiable (see 'Examples').
See Also
Examples
oldstring <- c("Hello", "Helo", "Hole", "Apple",
"Ape", "New", "Old", "System", "Systemic")
newstring <- group_str(oldstring)
# see result
newstring
# count for each groups
table(newstring)
# print table to compare original and grouped string
frq(oldstring)
frq(newstring)
# larger groups
newstring <- group_str(oldstring, precision = 3)
frq(oldstring)
frq(newstring)
# be more strict with matching pairs
newstring <- group_str(oldstring, precision = 3, strict = TRUE)
frq(oldstring)
frq(newstring)