bdm.pakde {bigMap}R Documentation

Perplexity-adaptive kernel density estimation


Starts the paKDE algorithm (second step of the mapping protocol).


bdm.pakde(bdm, layer = 1, threads = 2, type = "SOCK", ppx = 100,
  itr = 100, tol = 1e-05, g = 200, g.exp = 3)



A bdm instance as generated by bdm.init().


The number of the t-SNE layer (1 by default).


The number of parallel threads (in principle only limited by hardware resources, i.e. number of cores and available memory)


The type of cluster: 'SOCK' (default) for intra-node parallelization, 'MPI' (message passing interface) for inter-node parallelization.


The value of perplexity to compute similarities in the low-dimensional embedding (100 by default).


The number of iterations for computing input similarities (100 by default).


The tolerance lower bound for computing input similarities (1e-05 by default).


The resolution of the density space grid (g*g cells, 200 by default).


A numeric factor to avoid border effects. The grid limits will be expanded so as to enclose the density of the kernel of the most extreme embedded datapoints up to g.exp times σ. By default, (g.exp = 3) the grid limits are expanded so as to enclose the 0.9986 of the probability mass of the most extreme kernels.


When computing the paKDE the embedding area is discretized as a grid of size g*g cells. In order to avoid border effects, the limits of the grid are expanded by default so as to enclose at least the 0.9986 of the cumulative distribution function (3 σ) of the kernels of the most extreme mapped points in each direction.

The presence of outliers in the embedding can lead to undesired expansion of the grid limits. We can overcome this using lower values of g.exp. By setting g.exp = 0 the grid limits will be equal to the range of the embedding.

The values g.exp = c(1, 2, 3, 4, 5, 6) enclose cdf values of 0.8413, 0.9772, 0.9986, 0.99996, 0.99999, 1.0 respectively.


A copy of the input bdm instance with new element bdm$pakde (paKDE output). bdm$pakde[[layer]]$layer = 'NC' stands for not computed layers.


# --- load mapped dataset
# --- run paKDE
## Not run: 
exMap <- bdm.pakde(exMap, threads = 4, ppx = 200, g = 200, g.exp = 3)

## End(Not run)
# --- plot paKDE output

[Package bigMap version 2.3.1 Index]