bag_of_word_ify |
Function to convert a record into a bag of tokens with a fieldwise flag |
bag_signatures |
Function that reduces a bag of words into a signature matrix using multiple random projections |
block.ids.from.blocking |
Returns the block ids associated with a blocking method. |
calc_idf |
Function to calculate the inverse document frequency given a shingled bag of words |
confusion.from.blocking |
Perform evaluations (recall) for blocking. |
klsh |
Function that reduces a bag of words into a signature matrix using multiple random projections |
reduction.ratio |
Returns the reduction ratio associated with a blocking method |
reduction.ratio.from.blocking |
Returns the reduction ratio associated with a blocking method |
rproject_bags |
Function that generates unit random vectors and takes (weighted) projections onto the random unit vectors given a bag of words |
sacks_of_bags_of_words |
Function to convert all records into a bag of tokens |
tokenify |
Function to token a string into its k components |