seq_amatch {stringdist} | R Documentation |
Approximate matching for integer sequences.
Description
For a list
of integer vectors x
, find the closest matches in a
list
of integer or numeric vectors in table.
Usage
seq_amatch(
x,
table,
nomatch = NA_integer_,
matchNA = TRUE,
method = c("osa", "lv", "dl", "hamming", "lcs", "qgram", "cosine", "jaccard", "jw"),
weight = c(d = 1, i = 1, s = 1, t = 1),
maxDist = 0.1,
q = 1,
p = 0,
bt = 0,
nthread = getOption("sd_num_thread")
)
seq_ain(x, table, ...)
Arguments
x |
( |
table |
( |
nomatch |
The value to be returned when no match is found. This is coerced to integer. |
matchNA |
Should |
method |
Matching algorithm to use. See |
weight |
For |
maxDist |
Elements in |
q |
q-gram size, only when method is |
p |
Winkler's prefix parameter for Jaro-Winkler distance, with
|
bt |
Winkler's boost threshold. Winkler's prefix factor is
only applied when the Jaro distance is larger than |
nthread |
Number of threads used by the underlying C-code. A sensible
default is chosen, see |
... |
parameters to pass to |
Value
seq_amatch
returns the position of the closest match of x
in table
. When multiple matches with the same minimal distance
metric exist, the first one is returned. seq_ain
returns a
logical
vector of length length(x)
indicating wether an
element of x
approximately matches an element in table
.
Notes
seq_ain
is currently defined as
seq_ain(x,table,...) <- function(x,table,...) amatch(x, table, nomatch=0,...) > 0
All input vectors are converted with as.integer
. This causes truncation for numeric
vectors (e.g. pi
will be treated as 3L
).
See Also
Examples
x <- list(1:3,c(3:1),c(1L,3L,4L))
table <- list(
c(5L,3L,1L,2L)
,1:4
)
seq_amatch(x,table,maxDist=2)
# behaviour with missings
seq_amatch(list(c(1L,NA_integer_,3L),NA_integer_), list(1:3),maxDist=1)
## Not run:
# Match sentences based on word order. Note: words must match exactly or they
# are treated as completely different.
#
# For this example you need to have the 'hashr' package installed.
x <- "Mary had a little lamb"
x.words <- strsplit(x,"[[:blank:]]+")
x.int <- hashr::hash(x.words)
table <- c("a little lamb had Mary",
"had Mary a little lamb")
table.int <- hashr::hash(strsplit(table,"[[:blank:]]+"))
seq_amatch(x.int,table.int,maxDist=3)
## End(Not run)