CHICKN_W1 {chickn}R Documentation

Chromatogram Hierarchical Compressive K-means with Nystrom approximation

Description

An implementation of the complete pipeline of the CHICKN algorithm.

Usage

CHICKN_W1(
  Data,
  K = 2,
  k_total,
  K_W1 = NULL,
  kernel_type = "Gaussian",
  distance_type = "W1",
  Freq = NULL,
  ncores = 2,
  max_neighbors = 32,
  nblocks = 64,
  N0 = 10000,
  max_Nsize = 32,
  DoPreimage = FALSE,
  DIR_output = tempfile(),
  DIR_tmp = tempfile(),
  BIG = FALSE,
  verbose = FALSE,
  ...
)

Arguments

Data

A Filebacked Big Matrix n x N.

K

Number of cluster at each call of clustering method. Default is 2.

k_total

An upper bound of the total number of clusters.

K_W1

A Filebacked Big Matrix. Nystrom kernel matrix s \times N, where N is the number of signals in the training collection and s is the Nystrom sample size. By default is NULL and it is generated using Nystrom_kernel function.

kernel_type

Kernel function type c('Gaussian', 'Laplacian').

distance_type

Distance function type. The available types are Wasserstein-1 ('W1') and Euclidean ('Euclide'). The default value is 'W1'.

Freq

A frequency matrix m x n with frequency vectors in rows. If NULL, the frequency vectors are generated by GenerateFrequencies function.

ncores

Number of cores. Default is 2.

max_neighbors

Number of neighbors used to estimate the kernel parameter gamma. Default is 32.

nblocks

Number of blocks, on which the regression is performed. Default is 32.

N0

Number of data vectors used for the variance estimation in EstimSigma.

max_Nsize

Number of neighbors used to compute consensus chromatograms.

DoPreimage

logical that controls whether to compute the consensus chromatograms. Default is TRUE.

DIR_output

A directory to save the results.

DIR_tmp

A directory for temporal files.

BIG

logical parameter that controls whether the resulting consensus chromatograms are stored as a Filebacked Big Matrix ('Centroid_preimage.bk'). Default is FALSE.

verbose

logical that indicates whether dysplay the processing steps.

...

Additional arguments passed on to COMPR.

Details

CHICKN_W1 compresses the data by computing a Nystrom kernel approximation and applying the sketching operator from (Keriven et al. 2018). See Nystrom_kernel and Sketch functions. Then clusters are recovered by operating on the compressed data version. It can use the kernel function based on the Wasserstein-1 or the Euclidean distances. It generates in DIR_output directory the following files:

Value

A list with the following attributes:

References

See Also

Nystrom_kernel, GenerateFrequencies, hcc_parallel, Preimage, bigstatsr

Examples


data("UPS2")
N = ncol(UPS2)
n= nrow(UPS2)
X_FBM = bigstatsr::FBM(init = UPS2, ncol=N, nrow = n)$save()
output  <- CHICKN_W1(Data = X_FBM, K = 2, k_total =8, max_neighbors = 10, ncores = 2, 
                     N0 = N, DoPreimage = FALSE)
                     

[Package chickn version 1.2.3 Index]