multifileRL {bstrl} | R Documentation |
Perform multifile record linkage via Gibbs sampling "from scratch"
Description
Perform multifile record linkage via Gibbs sampling "from scratch"
Usage
multifileRL(
files,
flds = NULL,
types = NULL,
breaks = c(0, 0.25, 0.5),
nIter = 1000,
burn = round(nIter * 0.1),
a = 1,
b = 1,
aBM = 1,
bBM = 1,
proposals = c("component", "LB"),
blocksize = NULL,
seed = 0,
refresh = 0.1,
maxtime = Inf
)
Arguments
files |
A list of files |
flds |
Vector of names of the fields on which to compare the records in each file |
types |
Types of comparisons to use for each field |
breaks |
Breaks to use for Levenshtein distance on string fields |
nIter , burn |
MCMC run length parameters. The returned number of samples is nIter - burn. |
a , b |
Prior parameters for m and u, respectively. |
aBM , bBM |
Prior parameters for beta-linkage prior. |
proposals |
Which kind of full conditional proposals to use for the link vectors. |
blocksize |
What blocksize to use for locally balanced proposals. By default, LB proposals are not blocked |
seed |
Random seed to set at beginning of MCMC run |
refresh |
How often to output an update including the iteration number and percent complete. If refresh >= 1, taken as a number of iterations between messages (rounded). If 0 < refresh < 1, taken as the proportion of nIter. If refresh == 0, no messages are displayed. |
maxtime |
Amount of time, in seconds, after which the sampler will terminate with however many samples it has produced up to that point. The sample matrix columns for any unproduced samples will be filled with NAs |
Value
An object of class "bstrlstate"
Examples
data(geco_small)
# Names of the columns on which to perform linkage
fieldnames <- c("given.name", "surname", "age", "occup",
"extra1", "extra2", "extra3", "extra4", "extra5", "extra6")
# How to compare each of the fields
# First name and last name use normalized edit distance
# All others binary equal/unequal
types <- c("lv", "lv",
"bi", "bi", "bi", "bi", "bi", "bi", "bi", "bi")
# Break continuous difference measures into 4 levels using these split points
breaks <- c(0, 0.25, 0.5)
# Three file linkage using first three files in example dataset
multifile.result <- multifileRL(geco_small[1:3],
flds = fieldnames, types = types, breaks = breaks,
nIter = 2, burn = 1, # Very small run for example
proposals = "comp",
seed = 0)