fmatch {collapse} | R Documentation |
Fast Matching
Description
Fast matching of elements/rows in x
to elements/rows in table
.
This is a much faster replacement for match
that works
with atomic vectors and data frames / lists of equal-length vectors. It is the workhorse function of join
.
Usage
fmatch(x, table, nomatch = NA_integer_,
count = FALSE, overid = 1L)
# Check match: throws an informative error for non-matched elements
# Default message reflects frequent internal use to check data frame columns
ckmatch(x, table, e = "Unknown columns:", ...)
# Infix operators based on fmatch():
x %!in% table # Opposite of %in%
x %iin% table # = which(x %in% table), but more efficient
x %!iin% table # = which(x %!in% table), but more efficient
# Use set_collapse(mask = "%in%") to replace %in% with
# a much faster version based on fmatch()
Arguments
x |
a vector, list or data frame whose elements are matched against |
table |
a vector, list or data frame to match against. |
nomatch |
integer. Value to be returned in the case when no match is found. Default is |
count |
logical. Counts number of (unique) matches and attaches 4 attributes:
Note that computing these attributes requires an extra pass through the matching vector. Also note that these attributes contain no general information about whether either |
overid |
integer. If
|
e |
the error message thrown by |
... |
further arguments to |
Details
With data frames / lists, fmatch
compares the rows but moves through the data on a column-by-column basis (like a vectorized hash join algorithm). With two or more columns, the first two columns are hashed simultaneously for speed. Further columns can be added to this match. It is likely that the first 2, 3, 4 etc. columns of a data frame fully identify the data. After each column fmatch()
internally checks whether the table
rows that are still eligible for matching (eliminating nomatch
rows from earlier columns) are unique. If this is the case and overid = 0
, fmatch()
terminates early without considering further columns. This is efficient but may give undesirable/wrong results if considering further columns would turn some additional elements of the result vector into nomatch
values.
Value
Integer vector containing the positions of first matches of x
in table
. nomatch
is returned for elements of x
that have no match in table
. If count = TRUE
, the result has additional attributes and a class "qG"
.
See Also
join
, funique
, group
, Fast Grouping and Ordering, Collapse Overview
Examples
x <- c("b", "c", "a", "e", "f", "ff")
fmatch(x, letters)
fmatch(x, letters, nomatch = 0)
fmatch(x, letters, count = TRUE)
# Table 1
df1 <- data.frame(
id1 = c(1, 1, 2, 3),
id2 = c("a", "b", "b", "c"),
name = c("John", "Bob", "Jane", "Carl")
)
head(df1)
# Table 2
df2 <- data.frame(
id1 = c(1, 2, 3, 3),
id2 = c("a", "b", "c", "e"),
name = c("John", "Janne", "Carl", "Lynne")
)
head(df2)
# This gives an overidentification warning: columns 1:2 identify the data
if(FALSE) fmatch(df1, df2)
# This just runs through without warning
fmatch(df1, df2, overid = 2)
# This terminates computation after first 2 columns
fmatch(df1, df2, overid = 0)
fmatch(df1[1:2], df2[1:2]) # Same thing!
# -> note that here we get an additional match based on the unique ids,
# which we didn't get before because "Jane" != "Janne"