Nystrom_kernel {chickn} R Documentation

## Nystrom kernel approximation

### Description

An implementation of the Nystrom kernel approximation method.

### Usage

Nystrom_kernel(
Data,
c,
l,
s,
gamma = NULL,
max_neighbors = 32,
DIR_output = tempfile(),
DIR_save = tempfile(),
ncores = 2,
ncores_svd = 1,
distance_type = "W1",
kernel_type = "Gaussian",
verbose = FALSE
)


### Arguments

 Data A Filebacked Big Matrix n x N. Data vectors are stored in the matrix columns. c Number of columns selected for the approximation. l An intermediate rank l < c. s A target rank s < l. gamma Kernel parameter. If it is NULL (default), the parameter is estimated using gamma_estimation. max_neighbors Number of neigbors selected for the paramenter estimation. DIR_output A directory for intermediate computations. DIR_save A directory to save the result. ncores Number of cores. Default is 2. ncores_svd Number of cores used for the SVD computaion. It is recommended to use 1 core (default). distance_type Distance function type. The available types are Wasserstein-1 ('W1') and Euclidean ('Euclide'). The default value is 'W1'. kernel_type Kernel function type c('Gaussian', 'Laplacian'). verbose logical that indicates whether dysplay the processing steps.

### Details

Nystrom method consists in approximating the kernel matrix K by C W^{-1} C^{\top}, with C \in R^{N \times c} obtained from K by randomly selecting only c columns and W \in R^{c \times c} obtained from C by selecting as well c corresponding rows. The kernel function, based on the distance metric, is given as follows: k(x_i,x_j) = e^{- gamma \cdot d^p(x_i,x_j)}, where p is equal to 1 for 'Laplacian' kernel and equal to 2 for 'Gaussian' kernel and where d(x_i,x_j) is the distance between data vectors x_i and x_j.

### Value

A list with the following attributes:

• K_W1 is the Filebacked Big Matrix of the Nystrom kernel approximation.

• gamma is the estimated kernel parameter.

• RandomSample is the data vector indices, selected for the Nystrom approximation.

### Note

This is an implemetation of the Nystrom kernel approximation method proposed in Wang S, Gittens A, Mahoney MW (2019). “Scalable kernel K-means clustering with Nyström approximation: relative-error bounds.” The Journal of Machine Learning Research, 20(1), 431–479..

W1_parallel, gamma_estimation, big_randomSVD, cumsum_parallel.

### Examples


X = matrix(rnorm(2000), ncol=100, nrow = 20)
X_FBM = bigstatsr::FBM(init = X, ncol=100, nrow = 20)

output = Nystrom_kernel(Data = X_FBM, c = 10, l = 7, s = 5,
max_neighbors = 3, ncores = 2)



