R: Utility functions for LSH models

ft_lsh_utils {sparklyr}

R Documentation

Utility functions for LSH models

Description

Utility functions for LSH models

Usage

ml_approx_nearest_neighbors(
  model,
  dataset,
  key,
  num_nearest_neighbors,
  dist_col = "distCol"
)

ml_approx_similarity_join(
  model,
  dataset_a,
  dataset_b,
  threshold,
  dist_col = "distCol"
)

Arguments

`model`	A fitted LSH model, returned by either `ft_minhash_lsh()` or `ft_bucketed_random_projection_lsh()`.
`dataset`	The dataset to search for nearest neighbors of the key.
`key`	Feature vector representing the item to search for.
`num_nearest_neighbors`	The maximum number of nearest neighbors.
`dist_col`	Output column for storing the distance between each result row and the key.
`dataset_a`	One of the datasets to join.
`dataset_b`	Another dataset to join.
`threshold`	The threshold for the distance of row pairs.

[Package sparklyr version 1.8.6 Index]