distance_join {fuzzyjoin} | R Documentation |
Join two tables based on a distance metric of one or more columns
Description
This differs from difference_join
in that it considers
all of the columns together when computing distance. This allows it
to use metrics such as Euclidean or Manhattan that depend on multiple
columns. Note that if you are computing with longitude or latitude,
you probably want to use geo_join
.
Usage
distance_join(
x,
y,
by = NULL,
max_dist = 1,
method = c("euclidean", "manhattan"),
mode = "inner",
distance_col = NULL
)
distance_inner_join(
x,
y,
by = NULL,
method = "euclidean",
max_dist = 1,
distance_col = NULL
)
distance_left_join(
x,
y,
by = NULL,
method = "euclidean",
max_dist = 1,
distance_col = NULL
)
distance_right_join(
x,
y,
by = NULL,
method = "euclidean",
max_dist = 1,
distance_col = NULL
)
distance_full_join(
x,
y,
by = NULL,
method = "euclidean",
max_dist = 1,
distance_col = NULL
)
distance_semi_join(
x,
y,
by = NULL,
method = "euclidean",
max_dist = 1,
distance_col = NULL
)
distance_anti_join(
x,
y,
by = NULL,
method = "euclidean",
max_dist = 1,
distance_col = NULL
)
Arguments
x |
A tbl |
y |
A tbl |
by |
Columns by which to join the two tables |
max_dist |
Maximum distance to use for joining |
method |
Method to use for computing distance, either euclidean (default) or manhattan. |
mode |
One of "inner", "left", "right", "full" "semi", or "anti" |
distance_col |
If given, will add a column with this name containing the distance between the two |
Examples
library(dplyr)
head(iris)
sepal_lengths <- data_frame(Sepal.Length = c(5, 6, 7),
Sepal.Width = 1:3)
iris %>%
distance_inner_join(sepal_lengths, max_dist = 2)
[Package fuzzyjoin version 0.1.6 Index]