multifileRL {bstrl} R Documentation

## Perform multifile record linkage via Gibbs sampling "from scratch"

### Description

Perform multifile record linkage via Gibbs sampling "from scratch"

### Usage

multifileRL(
files,
flds = NULL,
types = NULL,
breaks = c(0, 0.25, 0.5),
nIter = 1000,
burn = round(nIter * 0.1),
a = 1,
b = 1,
aBM = 1,
bBM = 1,
proposals = c("component", "LB"),
blocksize = NULL,
seed = 0,
refresh = 0.1,
maxtime = Inf
)


### Arguments

 files A list of files flds Vector of names of the fields on which to compare the records in each file types Types of comparisons to use for each field breaks Breaks to use for Levenshtein distance on string fields nIter, burn MCMC run length parameters. The returned number of samples is nIter - burn. a, b Prior parameters for m and u, respectively. aBM, bBM Prior parameters for beta-linkage prior. proposals Which kind of full conditional proposals to use for the link vectors. blocksize What blocksize to use for locally balanced proposals. By default, LB proposals are not blocked seed Random seed to set at beginning of MCMC run refresh How often to output an update including the iteration number and percent complete. If refresh >= 1, taken as a number of iterations between messages (rounded). If 0 < refresh < 1, taken as the proportion of nIter. If refresh == 0, no messages are displayed. maxtime Amount of time, in seconds, after which the sampler will terminate with however many samples it has produced up to that point. The sample matrix columns for any unproduced samples will be filled with NAs

### Value

An object of class "bstrlstate"

### Examples

data(geco_small)

# Names of the columns on which to perform linkage
fieldnames <- c("given.name", "surname", "age", "occup",
"extra1", "extra2", "extra3", "extra4", "extra5", "extra6")

# How to compare each of the fields
# First name and last name use normalized edit distance
# All others binary equal/unequal
types <- c("lv", "lv",
"bi", "bi", "bi", "bi", "bi", "bi", "bi", "bi")
# Break continuous difference measures into 4 levels using these split points
breaks <- c(0, 0.25, 0.5)

# Three file linkage using first three files in example dataset
multifile.result <- multifileRL(geco_small[1:3],
flds = fieldnames, types = types, breaks = breaks,
nIter = 2, burn = 1, # Very small run for example
proposals = "comp",
seed = 0)



[Package bstrl version 1.0.2 Index]