compare_pairs.cluster_pairs {reclin2}R Documentation

Compare pairs on a set of variables common in both data sets

Description

Compare pairs on a set of variables common in both data sets

Usage

## S3 method for class 'cluster_pairs'
compare_pairs(
  pairs,
  on,
  comparators = list(default_comparator),
  default_comparator = cmp_identical(),
  new_name = NULL,
  ...
)

compare_pairs(
  pairs,
  on,
  comparators = list(default_comparator),
  default_comparator = cmp_identical(),
  ...
)

## S3 method for class 'pairs'
compare_pairs(
  pairs,
  on,
  comparators = list(default_comparator),
  default_comparator = cmp_identical(),
  x = attr(pairs, "x"),
  y = attr(pairs, "y"),
  inplace = FALSE,
  ...
)

Arguments

pairs

data.table with pairs. Should contain the columns .x and .y.

on

character vector of variables that should be compared.

comparators

named list of functions with which the variables are compared. This function should accept two vectors. Function should either return a vector or a data.table with multiple columns.

default_comparator

variables for which no comparison function is defined using comparators is compares with the function default_comparator.

new_name

name of new object to assign the pairs to on the cluster nodes.

...

Ignored for now

x

data.table with one half of the pairs.

y

data.table with the other half of the pairs.

inplace

logical indicating whether pairs should be modified in place. When pairs is large this can be more efficient.

Details

It is assumed the variables in on are present in both x and y. Variables with the same names are added to pairs. When the comparator returns a data.table multiple columns are added to pairs. The names of these columns are variable pasted together with the names of the data.table returned by comparator (separated by "_").

Value

Returns the data.table pairs with one or more columns added in case of compare_pairs.pairs.

In case of compare_pairs.cluster_pairs, compare_pair.pairs is called on each cluster node and the resulting pairs are assigned to new_name in the environment reclin_env. When new_name is not given (or equal to NULL) the original pairs on the nodes are overwritten.


[Package reclin2 version 0.5.0 Index]