dimred_umap {dyndimred}R Documentation

UMAP

Description

UMAP

Usage

dimred_umap(
  x,
  ndim = 2,
  distance_method = c("euclidean", "cosine", "manhattan"),
  pca_components = 50,
  n_neighbors = 15L,
  init = "spectral",
  n_threads = 1
)

Arguments

x

Log transformed expression data, with rows as cells and columns as features

ndim

The number of dimensions

distance_method

The name of the distance metric, see dynutils::calculate_distance

pca_components

The number of pca components to use for UMAP. If NULL, PCA will not be performed first

n_neighbors

The size of local neighborhood (in terms of number of neighboring sample points).

init

Type of initialization for the coordinates. Options are:

  • "spectral" Spectral embedding using the normalized Laplacian of the fuzzy 1-skeleton, with Gaussian noise added.

  • "normlaplacian". Spectral embedding using the normalized Laplacian of the fuzzy 1-skeleton, without noise.

  • "random". Coordinates assigned using a uniform random distribution between -10 and 10.

  • "lvrandom". Coordinates assigned using a Gaussian distribution with standard deviation 1e-4, as used in LargeVis (Tang et al., 2016) and t-SNE.

  • "laplacian". Spectral embedding using the Laplacian Eigenmap (Belkin and Niyogi, 2002).

  • "pca". The first two principal components from PCA of X if X is a data frame, and from a 2-dimensional classical MDS if X is of class "dist".

  • "spca". Like "pca", but each dimension is then scaled so the standard deviation is 1e-4, to give a distribution similar to that used in t-SNE. This is an alias for init = "pca", init_sdev = 1e-4.

  • "agspectral" An "approximate global" modification of "spectral" which all edges in the graph to a value of 1, and then sets a random number of edges (negative_sample_rate edges per vertex) to 0.1, to approximate the effect of non-local affinities.

  • A matrix of initial coordinates.

For spectral initializations, ("spectral", "normlaplacian", "laplacian"), if more than one connected component is identified, each connected component is initialized separately and the results are merged. If verbose = TRUE the number of connected components are logged to the console. The existence of multiple connected components implies that a global view of the data cannot be attained with this initialization. Either a PCA-based initialization or increasing the value of n_neighbors may be more appropriate.

n_threads

Number of threads to use (except during stochastic gradient descent). Default is half the number of concurrent threads supported by the system. For nearest neighbor search, only applies if nn_method = "annoy". If n_threads > 1, then the Annoy index will be temporarily written to disk in the location determined by tempfile.

See Also

uwot::umap()

Examples

library(Matrix)
dataset <- abs(Matrix::rsparsematrix(100, 100, .5))
dimred_umap(dataset, ndim = 2, pca_components = NULL)

[Package dyndimred version 1.0.4 Index]