Nystrom_kernel {chickn}R Documentation

Nystrom kernel approximation

Description

An implementation of the Nystrom kernel approximation method.

Usage

Nystrom_kernel(
  Data,
  c,
  l,
  s,
  gamma = NULL,
  max_neighbors = 32,
  DIR_output = tempfile(),
  DIR_save = tempfile(),
  ncores = 2,
  ncores_svd = 1,
  distance_type = "W1",
  kernel_type = "Gaussian",
  verbose = FALSE
)

Arguments

Data

A Filebacked Big Matrix n x N. Data vectors are stored in the matrix columns.

c

Number of columns selected for the approximation.

l

An intermediate rank l < c.

s

A target rank s < l.

gamma

Kernel parameter. If it is NULL (default), the parameter is estimated using gamma_estimation.

max_neighbors

Number of neigbors selected for the paramenter estimation.

DIR_output

A directory for intermediate computations.

DIR_save

A directory to save the result.

ncores

Number of cores. Default is 2.

ncores_svd

Number of cores used for the SVD computaion. It is recommended to use 1 core (default).

distance_type

Distance function type. The available types are Wasserstein-1 ('W1') and Euclidean ('Euclide'). The default value is 'W1'.

kernel_type

Kernel function type c('Gaussian', 'Laplacian').

verbose

logical that indicates whether dysplay the processing steps.

Details

Nystrom method consists in approximating the kernel matrix K by C W^{-1} C^{\top}, with C \in R^{N \times c} obtained from K by randomly selecting only c columns and W \in R^{c \times c} obtained from C by selecting as well c corresponding rows. The kernel function, based on the distance metric, is given as follows: k(x_i,x_j) = e^{- gamma \cdot d^p(x_i,x_j)}, where p is equal to 1 for 'Laplacian' kernel and equal to 2 for 'Gaussian' kernel and where d(x_i,x_j) is the distance between data vectors x_i and x_j.

Value

A list with the following attributes:

Note

This is an implemetation of the Nystrom kernel approximation method proposed in Wang S, Gittens A, Mahoney MW (2019). “Scalable kernel K-means clustering with Nyström approximation: relative-error bounds.” The Journal of Machine Learning Research, 20(1), 431–479..

See Also

W1_parallel, gamma_estimation, big_randomSVD, cumsum_parallel.

Examples


X = matrix(rnorm(2000), ncol=100, nrow = 20)
X_FBM = bigstatsr::FBM(init = X, ncol=100, nrow = 20)

output = Nystrom_kernel(Data = X_FBM, c = 10, l = 7, s = 5, 
                        max_neighbors = 3, ncores = 2)
                        

[Package chickn version 1.2.3 Index]