KernelKnn_nmslib {nmslibR}R Documentation

Approximate Kernel k nearest neighbors using the nmslib library

Description

Approximate Kernel k nearest neighbors using the nmslib library

Usage

KernelKnn_nmslib(
  data,
  y,
  TEST_data = NULL,
  k = 5,
  h = 1,
  weights_function = NULL,
  Levels = NULL,
  Index_Params = NULL,
  Time_Params = NULL,
  space = "l1",
  space_params = NULL,
  method = "hnsw",
  data_type = "DENSE_VECTOR",
  dtype = "FLOAT",
  index_filepath = NULL,
  print_progress = FALSE,
  num_threads = 1
)

Arguments

data

either a matrix or a scipy sparse matrix

y

a numeric vector specifying the response variable (in classification the labels must be numeric from 1:Inf). The length of y must equal the rows of the data parameter

TEST_data

a test dataset (in case of a matrix the TEST_data should have equal number of columns with the data). It is assumed that the TEST_data is an unlabeled dataset

k

an integer. The number of neighbours to return

h

the bandwidth (applicable if the weights_function is not NULL, defaults to 1.0)

weights_function

there are various ways of specifying the kernel function. See the details section.

Levels

a numeric vector. In case of classification the unique levels of the response variable are necessary

Index_Params

a list of (optional) parameters to use in indexing (when creating the index)

Time_Params

a list of parameters to use in querying. Setting Time_Params to NULL will reset

space

a character string (optional). The metric space to create for this index. Page 31 of the manual (see references) explains all available inputs

space_params

a list of (optional) parameters for configuring the space. See the references manual for more details.

method

a character string specifying the index method to use

data_type

a character string. One of 'DENSE_UINT8_VECTOR', 'DENSE_VECTOR', 'OBJECT_AS_STRING' or 'SPARSE_VECTOR'

dtype

a character string. Either 'FLOAT' or 'INT'

index_filepath

a character string specifying the path to a file, where an existing index is saved

print_progress

a boolean (either TRUE or FALSE). Whether or not to display progress bar

num_threads

an integer. The number of threads to use

Details

There are three possible ways to specify the weights function, 1st option : if the weights_function is NULL then a simple k-nearest-neighbor is performed. 2nd option : the weights_function is one of 'uniform', 'triangular', 'epanechnikov', 'biweight', 'triweight', 'tricube', 'gaussian', 'cosine', 'logistic', 'gaussianSimple', 'silverman', 'inverse', 'exponential'. The 2nd option can be extended by combining kernels from the existing ones (adding or multiplying). For instance, I can multiply the tricube with the gaussian kernel by giving 'tricube_gaussian_MULT' or I can add the previously mentioned kernels by giving 'tricube_gaussian_ADD'. 3rd option : a user defined kernel function

Examples


try({
  if (reticulate::py_available(initialize = FALSE)) {
    if (reticulate::py_module_available("nmslib")) {

      library(nmslibR)

      x = matrix(runif(1000), nrow = 100, ncol = 10)

      y = runif(100)

      out = KernelKnn_nmslib(data = x, y = y, k = 5)
    }
  }
}, silent=TRUE)

[Package nmslibR version 1.0.7 Index]