pair_blocking {reclin2}R Documentation

Generate pairs using simple blocking

Description

Generates all combinations of records from x and y where the blocking variables are equal.

Usage

pair_blocking(x, y, on, deduplication = FALSE, add_xy = TRUE)

Arguments

x

first data.frame

y

second data.frame. Ignored when deduplication = TRUE.

on

the variables defining the blocks or strata for which all pairs of x and y will be generated.

deduplication

generate pairs from only x. Ignore y. This is usefull for deduplication of x.

add_xy

add x and y as attributes to the returned pairs. This makes calling some subsequent operations that need x and y (such as compare_pairs easier.

Details

Generating (all) pairs of the records of two data sets, is usually the first step when linking the two data sets. However, this often results in a too large number of records. Therefore, blocking is usually applied.

Value

A data.table with two columns, .x and .y, is returned. Columns .x and .y are row numbers from data.frames .x and .y respectively.

See Also

pair and pair_minsim are other methods to generate pairs.

Examples

data("linkexample1", "linkexample2")
pairs <- pair_blocking(linkexample1, linkexample2, "postcode")


[Package reclin2 version 0.5.0 Index]