bdm.ptsne {bigMap} | R Documentation |

## Parallelized t-SNE

### Description

Starts the ptSNE algorithm (first step of the mapping protocol).

### Usage

```
bdm.ptsne(bdm, threads = 3, type = "SOCK", layers = 2, rounds = 1,
boost = 2, whiten = 4, input.dim = NULL, ppx = 100, itr = 100,
tol = 1e-05, alpha = 0.5, Y.init = NULL, info = 1)
```

### Arguments

`bdm` |
A |

`threads` |
The number of parallel threads (in principle only limited by hardware resources, |

`type` |
The type of cluster: 'SOCK' (default) for intra-node parallelization, 'MPI' ( |

`layers` |
The number of layers ( |

`rounds` |
The number of rounds (2 by default). |

`boost` |
A running time accelerator factor. By default ( |

`whiten` |
Preprocessing of raw data. If |

`input.dim` |
If raw data is given as (or is transformed to) principal components, |

`ppx` |
The value of perplexity to compute similarities (100 by default). |

`itr` |
The number of iterations for computing input similarities (100 by default). |

`tol` |
The tolerance lower bound for computing input similarities (1e-05 by default). |

`alpha` |
The momentum factor (0.5 by default). |

`Y.init` |
A |

`info` |
Progress output information: 1 yields inter-round results for progressive analytics, 0 disables intermediate results. Default value is 1. |

### Details

By default the algorithm is structured in `\sqrt{n}`

epochs of `\sqrt{z}`

iterations each, where `n` is the dataset size and `z` is the thread-size (`z=n*layers/threads`

). The running time of the algorithm is then determined by `epochs*iters*t_i+ epochs*t_e`

where `t_i` is the running time of a single iteration and `t_e` is the inter-epoch running time.

The `boost` factor is meant to reduce the running time. With `boost > 1`

the algorithm is structured in `n/boost`

epochs with `z*boost`

iterations each. This structure performs the same total number of iterations but arranged into a lower number of epochs, thus decreasing the total running time to `epochs*iters*t_i + 1/boost*epochs*t_e`

. When the number of threads is high, the inter-epoch time can be high, in particular when using 'MPI' parallelization, thus, reducing the number of epochs can result in a significant reduction of the total running time. The counterpart is that increasing the number of iterations per epoch might result in a lack of convergence, thus the `boost` factor must be used with caution. To the most of our knowledge using values up to `boost=2.5`

is generally safe.

In case of extremely large datasets, we strongly recommend to initialize the `bdm` instance with already preprocessed data and use `whiten = 0`

. Fast principal components approximations can be computed by means of `e.g.` `flashpcaR`

or `scater`

R packages.

### Value

A copy of the input `bdm` instance with new element `bdm$ptsne` (t-SNE output).

### Examples

```
# --- load example dataset
bdm.example()
# --- perform ptSNE
## Not run:
exMap <- bdm.ptsne(exMap, threads = 10, layers = 2, rounds = 2, ppx = 200)
## End(Not run)
# --- plot the Cost function
bdm.cost(exMap)
# --- plot ptSNE output
bdm.ptsne.plot(exMap)
```

*bigMap*version 2.3.1 Index]