matches {grr} | R Documentation |
Value Matching
Description
Returns a lookup table or list of the positions of ALL matches of its first
argument in its second and vice versa. Similar to match
, though
that function only returns the first match.
Usage
matches(x, y, all.x = TRUE, all.y = TRUE, list = FALSE, indexes = TRUE,
nomatch = NA)
Arguments
x |
vector. The values to be matched. Long vectors are not currently supported. |
y |
vector. The values to be matched. Long vectors are not currently supported. |
all.x |
logical; if |
all.y |
logical; if |
list |
logical. If |
indexes |
logical. Whether to return the indices of the matches or the actual values. |
nomatch |
the value to be returned in the case when no match is found.
If not provided and |
Details
This behavior can be imitated by using joins to create lookup tables, but
matches
is simpler and faster: usually faster than the best joins in
other packages and thousands of times faster than the built in
merge
.
all.x/all.y
correspond to the four types of database joins in the
following way:
- left
all.x=TRUE
,all.y=FALSE
- right
all.x=FALSE
,all.y=TRUE
- inner
all.x=FALSE
,all.y=FALSE
- full
all.x=TRUE
,all.y=TRUE
Note that NA
values will match other NA
values.
Examples
one<-as.integer(1:10000)
two<-as.integer(sample(1:10000,1e3,TRUE))
system.time(a<-lapply(one, function (x) which(two %in% x)))
system.time(b<-matches(one,two,all.y=FALSE,list=TRUE))
#Only retain items from one with a match in two
b<-matches(one,two,all.x=FALSE,all.y=FALSE,list=TRUE)
length(b)==length(unique(two))
one<-round(runif(1e3),3)
two<-round(runif(1e3),3)
system.time(a<-lapply(one, function (x) which(two %in% x)))
system.time(b<-matches(one,two,all.y=FALSE,list=TRUE))
one<-as.character(1:1e5)
two<-as.character(sample(1:1e5,1e5,TRUE))
system.time(b<-matches(one,two,list=FALSE))
system.time(c<-merge(data.frame(key=one),data.frame(key=two),all=TRUE))
## Not run:
one<-as.integer(1:1000000)
two<-as.integer(sample(1:1000000,1e5,TRUE))
system.time(b<-matches(one,two,indexes=FALSE))
if(requireNamespace("dplyr",quietly=TRUE))
system.time(c<-dplyr::full_join(data.frame(key=one),data.frame(key=two)))
if(require(data.table,quietly=TRUE))
system.time(d<-merge(data.table(data.frame(key=one))
,data.table(data.frame(key=two))
,by='key',all=TRUE,allow.cartesian=TRUE))
one<-as.character(1:1000000)
two<-as.character(sample(1:1000000,1e5,TRUE))
system.time(a<-merge(one,two)) #Times out
system.time(b<-matches(one,two,indexes=FALSE))
if(requireNamespace("dplyr",quietly=TRUE))
system.time(c<-dplyr::full_join(data.frame(key=one),data.frame(key=two)))#'
if(require(data.table,quietly=TRUE))
{
system.time(d<-merge(data.table(data.frame(key=one))
,data.table(data.frame(key=two))
,by='key',all=TRUE,allow.cartesian=TRUE))
identical(b[,1],as.character(d$key))
}
## End(Not run)