Blocking for Record Linkage


[Up] [Top]

Documentation for package ‘klsh’ version 0.1.0

Help Pages

bag_of_word_ify Function to convert a record into a bag of tokens with a fieldwise flag
bag_signatures Function that reduces a bag of words into a signature matrix using multiple random projections
block.ids.from.blocking Returns the block ids associated with a blocking method.
calc_idf Function to calculate the inverse document frequency given a shingled bag of words
confusion.from.blocking Perform evaluations (recall) for blocking.
klsh Function that reduces a bag of words into a signature matrix using multiple random projections
reduction.ratio Returns the reduction ratio associated with a blocking method
reduction.ratio.from.blocking Returns the reduction ratio associated with a blocking method
rproject_bags Function that generates unit random vectors and takes (weighted) projections onto the random unit vectors given a bag of words
sacks_of_bags_of_words Function to convert all records into a bag of tokens
tokenify Function to token a string into its k components