lev_weighted_token_ratio {levitate} | R Documentation |
Weighted token similarity measure
Description
Computes similarity but allows you to assign weights to specific tokens. This is useful, for example, when you have a frequently-occurring string that doesn't contain useful information. See examples.
Usage
lev_weighted_token_ratio(a, b, weights = list(), ...)
Arguments
a , b |
The input strings |
weights |
List of token weights. For example, |
... |
Additional arguments to be passed to |
Value
A float
Details
The algorithm used here is as follows:
Tokenise the input strings
Compute the edit distance between each pair of tokens
Compute the maximum edit distance between each pair of tokens
Apply any weights from the
weights
argumentReturn
1 - (sum(weighted_edit_distances) / sum(weighted_max_edit_distance))
See Also
Other weighted token functions:
lev_weighted_token_set_ratio()
,
lev_weighted_token_sort_ratio()
Examples
lev_weighted_token_ratio("jim ltd", "tim ltd")
lev_weighted_token_ratio("tim ltd", "jim ltd", weights = list(ltd = 0.1))