score_simple.cluster_pairs {reclin2}R Documentation

Score pairs based on a number of comparison vectors

Description

Score pairs based on a number of comparison vectors

Usage

## S3 method for class 'cluster_pairs'
score_simple(
  pairs,
  variable,
  on,
  w1 = 1,
  w0 = 0,
  wna = 0,
  new_name = NULL,
  ...
)

score_simple(pairs, variable, on, w1 = 1, w0 = 0, wna = 0, ...)

## S3 method for class 'pairs'
score_simple(
  pairs,
  variable,
  on,
  w1 = 1,
  w0 = 0,
  wna = 0,
  inplace = FALSE,
  ...
)

Arguments

pairs

a pairs object, such as generated by pair_blocking

variable

the name of the new variable to create in pairs. This will be a logical variable with a value of TRUE for the selected pairs.

on

character vector of variables on which the score should be based.

w1

a vector or list with weights for agreement for each of the variables. It can either be a numeric vector of length 1 in which case the same weight is used for all variables; A numeric vector of length equal to the length of on in which case the weights correspond one-to-one to the variables in on; A named numeric vector where the names correspond to those in on, missing values are assigned a value of 1; or a named list with numeric values. See details for more information.

w0

a vector or list with weights for non-agreement for each of the variables. See details for more information. For the format see w1.

wna

a vector or list with weights for agreement for each of the variables. See details for more information. For the format see w1.

new_name

name of new object to assign the pairs to on the cluster nodes.

...

ignored

inplace

logical indicating whether pairs should be modified in place. When pairs is large this can be more efficient.

Details

The individual contribution of a variable x to the total score is given by x * w1 + (1-x) * w0 in case of non-NA values and wna in case of NA. This assumes that the values 1 corresponds to complete agreement and the value 0 to complete non-agreement. In case of complete agreement a variable contributes w1 to the total score and in case of complete non-agreement it contributes w0 to the total score.

Value

Returns the data.table pairs with the column variable added in case of score_simple.pairs.

In case of score_simple.cluster_pairs, score_simple.pairs is called on each cluster node and the resulting pairs are assigned to new_name in the environment reclin_env. When new_name is not given (or equal to NULL) the original pairs on the nodes are overwritten.

Examples

data("linkexample1", "linkexample2")
pairs <- pair_blocking(linkexample1, linkexample2, "postcode")
compare_pairs(pairs, on = c("firstname", "lastname", "sex"), inplace = TRUE)

score_simple(pairs, "score", on = c("firstname", "lastname", "sex"))

# Change the default weights
score_simple(pairs, "score", on = c("firstname", "lastname", "sex"), 
  w1 = 2, w0 = -1, wna = NA)

# Use a named vector; omited elements from w1 get a weight of 1; those from
# w0 and wna a weight of 0.
score_simple(pairs, "score", on = c("firstname", "lastname", "sex"), 
  w1 = c("firstname" = 2, "lastname" = 3), 
  w0 = c("firstname" = -1, "lastname" = -0.5))

# Use a named list; omited elements from w1 get a weight of 1; those from
# w0 and wna a weight of 0.
score_simple(pairs, "score", on = c("firstname", "lastname", "sex"), 
  w1 = list("firstname" = 2, "lastname" = 3), 
  w0 = list("firstname" = -1, "lastname" = -0.5))


[Package reclin2 version 0.5.0 Index]