cuda_ml_umap {cuda.ml} | R Documentation |
Uniform Manifold Approximation and Projection (UMAP) for dimension reduction.
Description
Run the Uniform Manifold Approximation and Projection (UMAP) algorithm to find a low dimensional embedding of the input data that approximates an underlying manifold.
Usage
cuda_ml_umap(
x,
y = NULL,
n_components = 2L,
n_neighbors = 15L,
n_epochs = 500L,
learning_rate = 1,
init = c("spectral", "random"),
min_dist = 0.1,
spread = 1,
set_op_mix_ratio = 1,
local_connectivity = 1L,
repulsion_strength = 1,
negative_sample_rate = 5L,
transform_queue_size = 4,
a = NULL,
b = NULL,
target_n_neighbors = n_neighbors,
target_metric = c("categorical", "euclidean"),
target_weight = 0.5,
transform_input = TRUE,
seed = NULL,
cuML_log_level = c("off", "critical", "error", "warn", "info", "debug", "trace")
)
Arguments
x |
The input matrix or dataframe. Each data point should be a row and should consist of numeric values only. |
y |
An optional numeric vector of target values for supervised dimension reduction. Default: NULL. |
n_components |
The dimension of the space to embed into. Default: 2. |
n_neighbors |
The size of local neighborhood (in terms of number of neighboring sample points) used for manifold approximation. Default: 15. |
n_epochs |
The number of training epochs to be used in optimizing the low dimensional embedding. Default: 500. |
learning_rate |
The initial learning rate for the embedding optimization. Default: 1.0. |
init |
Initialization mode of the low dimensional embedding. Must be one of "spectral", "random". Default: "spectral". |
min_dist |
The effective minimum distance between embedded points. Default: 0.1. |
spread |
The effective scale of embedded points. In combination with
|
set_op_mix_ratio |
Interpolate between (fuzzy) union and intersection as the set operation used to combine local fuzzy simplicial sets to obtain a global fuzzy simplicial sets. Both fuzzy set operations use the product t-norm. The value of this parameter should be between 0.0 and 1.0; a value of 1.0 will use a pure fuzzy union, while 0.0 will use a pure fuzzy intersection. Default: 1.0. |
local_connectivity |
The local connectivity required – i.e. the number of nearest neighbors that should be assumed to be connected at a local level. Default: 1. |
repulsion_strength |
Weighting applied to negative samples in low dimensional embedding optimization. Values higher than one will result in greater weight being given to negative samples. Default: 1.0. |
negative_sample_rate |
The number of negative samples to select per positive sample in the optimization process. Default: 5. |
transform_queue_size |
For transform operations (embedding new points using a trained model this will control how aggressively to search for nearest neighbors. Default: 4.0. |
a , b |
More specific parameters controlling the embedding. If not set,
then these values are set automatically as determined by |
target_n_neighbors |
The number of nearest neighbors to use to construct the target simplcial set. Default: n_neighbors. |
target_metric |
The metric for measuring distance between the actual and
and the target values ( |
target_weight |
Weighting factor between data topology and target topology. A value of 0.0 weights entirely on data, a value of 1.0 weights entirely on target. The default of 0.5 balances the weighting equally between data and target. |
transform_input |
If TRUE, then compute an approximate representation of the input data. Default: TRUE. |
seed |
Optional seed for pseudo random number generator. Default: NULL. Setting a PRNG seed will enable consistency of trained embeddings, allowing for reproducible results to 3 digits of precision, but at the expense of potentially slower training and increased memory usage. If the PRNG seed is not set, then the trained embeddings will not be deterministic. |
cuML_log_level |
Log level within cuML library functions. Must be one of "off", "critical", "error", "warn", "info", "debug", "trace". Default: off. |
Value
A UMAP model object that can be used as input to the
cuda_ml_transform()
function.
If transform_input
is set to TRUE, then the model object will
contain a "transformed_data" attribute containing the lower dimensional
embedding of the input data.
Examples
library(cuda.ml)
model <- cuda_ml_umap(
x = iris[1:4],
y = iris[[5]],
n_components = 2,
n_epochs = 200,
transform_input = TRUE
)
set.seed(0L)
print(kmeans(model$transformed, iter.max = 100, centers = 3))