epiClassify {RecordLinkage} | R Documentation |
Classify record pairs with EpiLink weights
Description
Classifies record pairs as link, non-link or possible link based on
weights computed by epiWeights
and the thresholds
passed as arguments.
Usage
epiClassify(rpairs, threshold.upper, threshold.lower = threshold.upper,
...)
## S4 method for signature 'RecLinkData'
epiClassify(rpairs, threshold.upper, threshold.lower = threshold.upper)
## S4 method for signature 'RLBigData'
epiClassify(rpairs, threshold.upper, threshold.lower = threshold.upper,
e = 0.01, f = getFrequencies(rpairs), withProgressBar = (sink.number()==0))
Arguments
rpairs |
|
threshold.upper |
A numeric value between 0 and 1. |
threshold.lower |
A numeric value between 0 and 1 lower than |
e |
Numeric vector. Estimated error rate(s). |
f |
Numeric vector. Average frequency of attribute values. |
withProgressBar |
Logical. Whether to display a progress bar. |
... |
Placeholder for optional arguments |
Details
All record pairs with weights greater or
equal threshold.upper
are classified as links. Record pairs with
weights smaller than threshold.upper
and greater or equal
threshold.lower
are classified as possible links. All remaining
records are classified as non-links.
For the "RecLinkData"
method, weights must have been calculated
for rpairs
using epiWeights
.
A progress bar is displayed by the "RLBigData"
method only if
weights are calculated on the fly and, by default, unless output is diverted by
sink
(e.g. in a Sweave script).
Value
For the "RecLinkData"
method, a S3 object
of class "RecLinkResult"
that represents a copy
of newdata
with element rpairs$prediction
, which stores
the classification result, as addendum.
For the "RLBigData"
method, a S4 object of class
"RLResult"
.
Author(s)
Andreas Borg, Murat Sariyar
See Also
Examples
# generate record pairs
data(RLdata500)
p=compare.dedup(RLdata500,strcmp=TRUE ,strcmpfun=levenshteinSim,
identity=identity.RLdata500, blockfld=list("by", "bm", "bd"))
# calculate weights
p=epiWeights(p)
# classify and show results
summary(epiClassify(p,0.6))