zoomerjoin-package {zoomerjoin}R Documentation

zoomerjoin: Superlatively Fast Fuzzy Joins

Description

logo

Empowers users to fuzzily-merge data frames with millions or tens of millions of rows in minutes with low memory usage. The package uses the locality sensitive hashing algorithms developed by Datar, Immorlica, Indyk and Mirrokni (2004) doi:10.1145/997817.997857, and Broder (1998) doi:10.1109/SEQUEN.1997.666900 to avoid having to compare every pair of records in each dataset, resulting in fuzzy-merges that finish in linear time.

Author(s)

Maintainer: Beniamino Green beniamino.green@yale.edu [copyright holder]

Other contributors:

See Also

Useful links:


[Package zoomerjoin version 0.1.4 Index]