bdm.ptsne {bigMap} | R Documentation |

Starts the ptSNE algorithm (first step of the mapping protocol).

bdm.ptsne(bdm, threads = 3, type = "SOCK", layers = 2, rounds = 1, boost = 2, whiten = 4, input.dim = NULL, ppx = 100, itr = 100, tol = 1e-05, alpha = 0.5, Y.init = NULL, info = 1)

`bdm` |
A |

`threads` |
The number of parallel threads (in principle only limited by hardware resources, |

`type` |
The type of cluster: 'SOCK' (default) for intra-node parallelization, 'MPI' ( |

`layers` |
The number of layers ( |

`rounds` |
The number of rounds (2 by default). |

`boost` |
A running time accelerator factor. By default ( |

`whiten` |
Preprocessing of raw data. If |

`input.dim` |
If raw data is given as (or is transformed to) principal components, |

`ppx` |
The value of perplexity to compute similarities (100 by default). |

`itr` |
The number of iterations for computing input similarities (100 by default). |

`tol` |
The tolerance lower bound for computing input similarities (1e-05 by default). |

`alpha` |
The momentum factor (0.5 by default). |

`Y.init` |
A |

`info` |
Progress output information: 1 yields inter-round results for progressive analytics, 0 disables intermediate results. Default value is 1. |

By default the algorithm is structured in *√{n}* epochs of *√{z}* iterations each, where `n` is the dataset size and `z` is the thread-size (*z=n*layers/threads*). The running time of the algorithm is then determined by *epochs*iters*t_i+ epochs*t_e* where `t_i` is the running time of a single iteration and `t_e` is the inter-epoch running time.

The `boost` factor is meant to reduce the running time. With *boost > 1* the algorithm is structured in *n/boost* epochs with *z*boost* iterations each. This structure performs the same total number of iterations but arranged into a lower number of epochs, thus decreasing the total running time to *epochs*iters*t_i + 1/boost*epochs*t_e*. When the number of threads is high, the inter-epoch time can be high, in particular when using 'MPI' parallelization, thus, reducing the number of epochs can result in a significant reduction of the total running time. The counterpart is that increasing the number of iterations per epoch might result in a lack of convergence, thus the `boost` factor must be used with caution. To the most of our knowledge using values up to *boost=2.5* is generally safe.

In case of extremely large datasets, we strongly recommend to initialize the `bdm` instance with already preprocessed data and use `whiten = 0`

. Fast principal components approximations can be computed by means of `e.g.` `flashpcaR`

or `scater`

R packages.

A copy of the input `bdm` instance with new element `bdm$ptsne` (t-SNE output).

# --- load example dataset bdm.example() # --- perform ptSNE ## Not run: exMap <- bdm.ptsne(exMap, threads = 10, layers = 2, rounds = 2, ppx = 200) ## End(Not run) # --- plot the Cost function bdm.cost(exMap) # --- plot ptSNE output bdm.ptsne.plot(exMap)

[Package *bigMap* version 2.3.1 Index]