score_simple.cluster_pairs {reclin2} | R Documentation |
Score pairs based on a number of comparison vectors
Description
Score pairs based on a number of comparison vectors
Usage
## S3 method for class 'cluster_pairs'
score_simple(
pairs,
variable,
on,
w1 = 1,
w0 = 0,
wna = 0,
new_name = NULL,
...
)
score_simple(pairs, variable, on, w1 = 1, w0 = 0, wna = 0, ...)
## S3 method for class 'pairs'
score_simple(
pairs,
variable,
on,
w1 = 1,
w0 = 0,
wna = 0,
inplace = FALSE,
...
)
Arguments
pairs |
a |
variable |
the name of the new variable to create in pairs. This will be a
logical variable with a value of |
on |
character vector of variables on which the score should be based. |
w1 |
a vector or list with weights for agreement for each of the
variables. It can either be a numeric vector of length 1 in which case the
same weight is used for all variables; A numeric vector of length equal to
the length of |
w0 |
a vector or list with weights for non-agreement for each of the
variables. See details for more information. For the format see |
wna |
a vector or list with weights for agreement for each of the
variables. See details for more information. For the format see |
new_name |
name of new object to assign the pairs to on the cluster nodes. |
... |
ignored |
inplace |
logical indicating whether |
Details
The individual contribution of a variable x
to the total score is
given by x * w1 + (1-x) * w0
in case of non-NA
values and
wna
in case of NA
. This assumes that the values 1 corresponds
to complete agreement and the value 0 to complete non-agreement. In case of
complete agreement a variable contributes w1
to the total score and in
case of complete non-agreement it contributes w0
to the total score.
Value
Returns the data.table
pairs
with the column variable
added in
case of score_simple.pairs
.
In case of score_simple.cluster_pairs
, score_simple.pairs
is called on
each cluster node and the resulting pairs are assigned to new_name
in
the environment reclin_env
. When new_name
is not given (or
equal to NULL) the original pairs on the nodes are overwritten.
Examples
data("linkexample1", "linkexample2")
pairs <- pair_blocking(linkexample1, linkexample2, "postcode")
compare_pairs(pairs, on = c("firstname", "lastname", "sex"), inplace = TRUE)
score_simple(pairs, "score", on = c("firstname", "lastname", "sex"))
# Change the default weights
score_simple(pairs, "score", on = c("firstname", "lastname", "sex"),
w1 = 2, w0 = -1, wna = NA)
# Use a named vector; omited elements from w1 get a weight of 1; those from
# w0 and wna a weight of 0.
score_simple(pairs, "score", on = c("firstname", "lastname", "sex"),
w1 = c("firstname" = 2, "lastname" = 3),
w0 = c("firstname" = -1, "lastname" = -0.5))
# Use a named list; omited elements from w1 get a weight of 1; those from
# w0 and wna a weight of 0.
score_simple(pairs, "score", on = c("firstname", "lastname", "sex"),
w1 = list("firstname" = 2, "lastname" = 3),
w0 = list("firstname" = -1, "lastname" = -0.5))