cuda_ml_umap {cuda.ml}  R Documentation 
Uniform Manifold Approximation and Projection (UMAP) for dimension reduction.
Description
Run the Uniform Manifold Approximation and Projection (UMAP) algorithm to find a low dimensional embedding of the input data that approximates an underlying manifold.
Usage
cuda_ml_umap(
x,
y = NULL,
n_components = 2L,
n_neighbors = 15L,
n_epochs = 500L,
learning_rate = 1,
init = c("spectral", "random"),
min_dist = 0.1,
spread = 1,
set_op_mix_ratio = 1,
local_connectivity = 1L,
repulsion_strength = 1,
negative_sample_rate = 5L,
transform_queue_size = 4,
a = NULL,
b = NULL,
target_n_neighbors = n_neighbors,
target_metric = c("categorical", "euclidean"),
target_weight = 0.5,
transform_input = TRUE,
seed = NULL,
cuML_log_level = c("off", "critical", "error", "warn", "info", "debug", "trace")
)
Arguments
x 
The input matrix or dataframe. Each data point should be a row and should consist of numeric values only. 
y 
An optional numeric vector of target values for supervised dimension reduction. Default: NULL. 
n_components 
The dimension of the space to embed into. Default: 2. 
n_neighbors 
The size of local neighborhood (in terms of number of neighboring sample points) used for manifold approximation. Default: 15. 
n_epochs 
The number of training epochs to be used in optimizing the low dimensional embedding. Default: 500. 
learning_rate 
The initial learning rate for the embedding optimization. Default: 1.0. 
init 
Initialization mode of the low dimensional embedding. Must be one of "spectral", "random". Default: "spectral". 
min_dist 
The effective minimum distance between embedded points. Default: 0.1. 
spread 
The effective scale of embedded points. In combination with

set_op_mix_ratio 
Interpolate between (fuzzy) union and intersection as the set operation used to combine local fuzzy simplicial sets to obtain a global fuzzy simplicial sets. Both fuzzy set operations use the product tnorm. The value of this parameter should be between 0.0 and 1.0; a value of 1.0 will use a pure fuzzy union, while 0.0 will use a pure fuzzy intersection. Default: 1.0. 
local_connectivity 
The local connectivity required – i.e. the number of nearest neighbors that should be assumed to be connected at a local level. Default: 1. 
repulsion_strength 
Weighting applied to negative samples in low dimensional embedding optimization. Values higher than one will result in greater weight being given to negative samples. Default: 1.0. 
negative_sample_rate 
The number of negative samples to select per positive sample in the optimization process. Default: 5. 
transform_queue_size 
For transform operations (embedding new points using a trained model this will control how aggressively to search for nearest neighbors. Default: 4.0. 
a , b 
More specific parameters controlling the embedding. If not set,
then these values are set automatically as determined by 
target_n_neighbors 
The number of nearest neighbors to use to construct the target simplcial set. Default: n_neighbors. 
target_metric 
The metric for measuring distance between the actual and
and the target values ( 
target_weight 
Weighting factor between data topology and target topology. A value of 0.0 weights entirely on data, a value of 1.0 weights entirely on target. The default of 0.5 balances the weighting equally between data and target. 
transform_input 
If TRUE, then compute an approximate representation of the input data. Default: TRUE. 
seed 
Optional seed for pseudo random number generator. Default: NULL. Setting a PRNG seed will enable consistency of trained embeddings, allowing for reproducible results to 3 digits of precision, but at the expense of potentially slower training and increased memory usage. If the PRNG seed is not set, then the trained embeddings will not be deterministic. 
cuML_log_level 
Log level within cuML library functions. Must be one of "off", "critical", "error", "warn", "info", "debug", "trace". Default: off. 
Value
A UMAP model object that can be used as input to the
cuda_ml_transform()
function.
If transform_input
is set to TRUE, then the model object will
contain a "transformed_data" attribute containing the lower dimensional
embedding of the input data.
Examples
library(cuda.ml)
model < cuda_ml_umap(
x = iris[1:4],
y = iris[[5]],
n_components = 2,
n_epochs = 200,
transform_input = TRUE
)
set.seed(0L)
print(kmeans(model$transformed, iter.max = 100, centers = 3))