R: Matching based on common tokens

lev_token_set_ratio {levitate}

R Documentation

Matching based on common tokens

Description

Compare stings based on shared tokens.

Usage

lev_token_set_ratio(a, b, pairwise = TRUE, useNames = TRUE, ...)

Arguments

`a`, `b`	The input strings
`pairwise`	Boolean. If `TRUE`, only the pairwise distances between `a` and `b` will be computed, rather than the combinations of all elements.
`useNames`	Boolean. Use input vectors as row and column names?
`...`	Additional arguments to be passed to `stringdist::stringdistmatrix()` or `stringdist::stringsimmatrix()`.

Value

A numeric scalar, vector or matrix depending on the length of the inputs.

Details

Similar to lev_token_sort_ratio() this function breaks the input down into tokens. It then identifies any common tokens between strings and creates three new strings:

x <- {common_tokens}
y <- {common_tokens}{remaining_unique_tokens_from_string_a}
z <- {common_tokens}{remaining_unique_tokens_from_string_b}

and performs three pairwise lev_ratio() calculations between them (x vs y, y vs z and x vs z). The highest of those three ratios is returned.

Examples

x <- "the quick brown fox jumps over the lazy dog"
y <- "my lazy dog was jumped over by a quick brown fox"

lev_ratio(x, y)

lev_token_sort_ratio(x, y)

lev_token_set_ratio(x, y)

[Package levitate version 0.2.0 Index]