getPairs {RecordLinkage} | R Documentation |
Extract Record Pairs
Description
Extracts record pairs from data and result objects.
Usage
## S4 method for signature 'RecLinkData'
getPairs(object, max.weight = Inf, min.weight = -Inf,
single.rows = FALSE, show = "all", sort = !is.null(object$Wdata))
## S4 method for signature 'RLBigData'
getPairs(object, max.weight = Inf, min.weight = -Inf,
filter.match = c("match", "unknown", "nonmatch"),
withWeight = hasWeights(object), withMatch = TRUE, single.rows = FALSE,
sort = withWeight)
## S4 method for signature 'RLResult'
getPairs(object, filter.match = c("match", "unknown", "nonmatch"),
filter.link = c("nonlink", "possible", "link"), max.weight = Inf,
min.weight = -Inf, withMatch = TRUE, withClass = TRUE,
withWeight = hasWeights(object@data), single.rows = FALSE, sort = withWeight)
getFalsePos(object, single.rows = FALSE)
getFalseNeg(object, single.rows = FALSE)
getFalse(object, single.rows = FALSE)
Arguments
object |
The data or result object from which to extract record pairs. |
max.weight , min.weight |
Real numbers. Upper and lower weight threshold. |
filter.match |
Character vector, a nonempty subset of |
filter.link |
Character vector, a nonempty subset of |
withWeight |
Logical. Whether to include linkage weights in the output. |
withMatch |
Logical. Whether to include matching status in the output. |
withClass |
Logical. Whether to include classification result in the output. |
single.rows |
Logical. Whether to print record pairs in one row instead of two consecutive rows. |
show |
Character. Selects which records to show, one of |
sort |
Logical. Whether to sort descending by weight. |
Details
These methods extract record pairs from "RecLinkData"
,
or "RecLinkResult"
, "RLBigData"
and
"RLResult"
objects. Possible applications are retrieving
a linkage result for further processing, conducting a manual review in order
to determine classification thresholds or inspecting misclassified pairs.
The various arguments can be grouped by the following purposes:
Controlling which record pairs are included in the output:
min.weight
andmax.weight
,filter.match
,filter.link
,show
.Controlling which information is shown:
withWeight
,withMatch
,withClass
Controlling the overall structure of the result:
sort
,single.rows
.
The weight limits are inclusive, i.e. a record pair with weight w
is included only if
w >= min.weight && w <= max.weight
.
If single.rows
is not TRUE
, pairs are output on two consecutive
lines in a more readable format. All data are converted to character, which
can lead to a loss of precision for numeric values.
Therefore, this format should be used for printing only.
getFalsePos
, getFalseNeg
and getFalse
are shortcuts
(currently for objects of class "RLResult"
only)
to retrieve false positives (links that are non-matches in fact),
false negatives (non-links that are matches in fact) or all falsely classified
pairs, respectively.
Value
A data frame. If single.rows
is TRUE
, each row holds (in this
order) id and data fields of the
first record, id and data fields of the second record and possibly matching
status, classification result and/or weight.
If single.rows
is not TRUE
, the result holds for each resulting
record pair consecutive rows of the following format:
ID and data fields of the first record followed by as many empty fields to match the length of the following line.
ID and data fields of the second record, possibly followed by matching status, classification result and/or weight.
A blank line to separate record pairs.
Note
When non-matches are included in the output and blocking is permissive, the result object can be very large, possibly leading to memory problems.
Author(s)
Andreas Borg, Murat Sariyar
Examples
data(RLdata500)
# create record pairs and calculate epilink weights
rpairs <- RLBigDataDedup(RLdata500, identity = identity.RLdata500,
blockfld=list(1,3,5,6,7))
rpairs <- epiWeights(rpairs)
# show all record pairs with weights between 0.5 and 0.6
getPairs(rpairs, min.weight=0.5, max.weight=0.6)
# show only matches with weight <= 0.5
getPairs(rpairs, max.weight=0.5, filter.match="match")
# classify with one threshold
result <- epiClassify(rpairs, 0.5)
# show all links, do not show classification in the output
getPairs(result, filter.link="link", withClass = FALSE)
# see wrongly classified pairs
getFalsePos(result)
getFalseNeg(result)