match.data.frame {Ecfun} | R Documentation |
Identify the row of y
best
matching each row of x
Description
For each row of x[, by.x]
,
find the best matching row of
y[, by.y]
, with the best
match defined by grep.
and
split
.
grep.
and split
must
either be missing
or
have the same length as by.x
and by.y
. If grep.[i]
and split[i]
are NA, do a
complete match of x[, by.x[i]]
and y[, by.y[i]]
. Otherwise,
for each row j
, look for a
match for strsplit(x[j, by.x[i]],
split[i])[[1]][1]
among
strsplit(y[, by.y[i]], split[i])
.
See details.
Usage
match.data.frame(x, y, by, by.x=by, by.y=by,
grep., split, sep=':')
Arguments
x , y |
data.frames |
by , by.x , by.y |
names of columns of |
grep. |
a character vector of the type of match
for each element of Alternatives are NOTE: These alternatives are not examined
if a unique match is found between
|
split |
A character vector of |
sep |
a |
Details
1. Check by.x, by.y, grep.
and
split
. If((missing(by.x) |
missing(by.y)) && missing(by)) by <- names(x)
2. fullMatch <- (is.na(grep.) & is
.na(split))
. Create keyfx
and
keyfy
by by pasting columns of
x[, by.x[fullMatch]]
and
y[, by.y[fullMatch]]
. Also
create x.
and y.
=
strsplit
of
x[, by.x[!fullMatch]]
.
3. Iterate over rows of x
looking
for the best match. This includes an inner
loop over columns of
x[, by.x[!fullMatch]]
, stopping
on the first unique match. Return (-1) if
no unique match is found.
Value
an integer vector of length nrow(x)
containing the index of the best matching row
of y
or NA
if no adequate match
was found.
Author(s)
Spencer Graves
See Also
strsplit
, is.na
grep
, agrep
match
, row.match
,
join
, match_df
classify
Examples
newdata <- data.frame(state=c("AL", "MI","NY"),
surname=c("Rogers", "Rogers", "Smith"),
givenName=c("Mike R.", "Mike K.", "Al"),
stringsAsFactors=FALSE)
reference <- data.frame(state=c("NY", "NY", "MI", "AL", "NY", "MI"),
surname=c("Smith", "Rogers", "Rogers (MI)",
"Rogers (AL)", "Smith", 'Jones'),
givenName=c("John", "Mike", "Mike", "Mike",
"T. Albert", 'Al Thomas'),
stringsAsFactors=FALSE)
newInRef <- match.data.frame(newdata, reference,
grep.=c(NA, 'agrep', 'agrep'))
all.equal(newInRef, c(4, 3, 5))