cluster_pair {reclin2} | R Documentation |
Generate all possible pairs using multiple processes
Description
Generates all combinations of records from x
and y
.
Usage
cluster_pair(cluster, x, y, deduplication = FALSE, name = "default")
Arguments
cluster |
a cluster object as created by |
x |
first |
y |
second |
deduplication |
generate pairs from only |
name |
the name of the resulting object to create locally on the different R processes. |
Details
Generating (all) pairs of the records of two data sets, is usually the first step when linking the two data sets.
x
is split into length{cluster}
parts which are distributed
over the worker nodes. y
is copied to each of the nodes. On the nodes
then pair
is called. The pairs are stored in the global
object reclin_env
on the nodes in the variable name
. The pairs
can then be further processes using functions such as
compare_pairs
, and tabulate_patterns
. The function
cluster_collect
collects the pairs from each of the nodes.
Value
A object of type cluster_pairs
which is a list
containing the
cluster and the name of the pairs object on the cluster nodes. For the pairs
objects created on the nodes see the documentation of pair
.
See Also
cluster_pair_blocking
and cluster_pair_minsim
are
other methods to generate pairs.
Examples
library(parallel)
data("linkexample1", "linkexample2")
cl <- makeCluster(2)
pairs <- cluster_pair(cl, linkexample1, linkexample2)
stopCluster(cl)