jaccard_probability {zoomerjoin} | R Documentation |
Find Probability of Match Based on Similarity
Description
This is a port of the
lsh_probability
function from the
textreuse
package, with arguments changed to reflect the hyperparameters in this
package. It gives the probability that two strings of jaccard similarity
similarity
will be matched, given the chosen bandwidth and number of
bands.
Usage
jaccard_probability(similarity, n_bands, band_width)
Arguments
similarity |
the similarity of the two strings you want to compare |
n_bands |
The number of LSH bands used in hashing. |
band_width |
The number of hashes in each band. |
Value
a decimal number giving the probability that the two items will be returned as a candidate pair from the minhash algorithm.
Examples
# Find the probability two pairs will be matched given they have a
# jaccard_similarity of .8, band width of 5, and 50 bands:
jaccard_probability(.8, n_bands = 50, band_width = 5)
[Package zoomerjoin version 0.1.5 Index]