cuda_ml_knn {cuda.ml} | R Documentation |
Build a KNN model.
Description
Build a k-nearest-model for classification or regression tasks.
Usage
cuda_ml_knn(x, ...)
## Default S3 method:
cuda_ml_knn(x, ...)
## S3 method for class 'data.frame'
cuda_ml_knn(
x,
y,
algo = c("brute", "ivfflat", "ivfpq", "ivfsq"),
metric = c("euclidean", "l2", "l1", "cityblock", "taxicab", "manhattan",
"braycurtis", "canberra", "minkowski", "chebyshev", "jensenshannon", "cosine",
"correlation"),
p = 2,
neighbors = 5L,
...
)
## S3 method for class 'matrix'
cuda_ml_knn(
x,
y,
algo = c("brute", "ivfflat", "ivfpq", "ivfsq"),
metric = c("euclidean", "l2", "l1", "cityblock", "taxicab", "manhattan",
"braycurtis", "canberra", "minkowski", "chebyshev", "jensenshannon", "cosine",
"correlation"),
p = 2,
neighbors = 5L,
...
)
## S3 method for class 'formula'
cuda_ml_knn(
formula,
data,
algo = c("brute", "ivfflat", "ivfpq", "ivfsq"),
metric = c("euclidean", "l2", "l1", "cityblock", "taxicab", "manhattan",
"braycurtis", "canberra", "minkowski", "chebyshev", "jensenshannon", "cosine",
"correlation"),
p = 2,
neighbors = 5L,
...
)
## S3 method for class 'recipe'
cuda_ml_knn(
x,
data,
algo = c("brute", "ivfflat", "ivfpq", "ivfsq"),
metric = c("euclidean", "l2", "l1", "cityblock", "taxicab", "manhattan",
"braycurtis", "canberra", "minkowski", "chebyshev", "jensenshannon", "cosine",
"correlation"),
p = 2,
neighbors = 5L,
...
)
Arguments
x |
Depending on the context: * A __data frame__ of predictors. * A __matrix__ of predictors. * A __recipe__ specifying a set of preprocessing steps * created from [recipes::recipe()]. * A __formula__ specifying the predictors and the outcome. |
... |
Optional arguments; currently unused. |
y |
A numeric vector (for regression) or factor (for classification) of desired responses. |
algo |
The query algorithm to use. Must be one of
"brute", "ivfflat", "ivfpq", "ivfsq" or a KNN algorithm specification
constructed using the Descriptions of supported algorithms: - "brute": for brute-force, slow but produces exact results. - "ivfflat": for inverted file, divide the dataset in partitions and perform search on relevant partitions only. - "ivfpq": for inverted file and product quantization (vectors are divided into sub-vectors, and each sub-vector is encoded using intermediary k-means clusterings to provide partial information). - "ivfsq": for inverted file and scalar quantization (vectors components are quantized into reduced binary representation allowing faster distances calculations). Default: "brute". |
metric |
Distance metric to use. Must be one of "euclidean", "l2", "l1", "cityblock", "taxicab", "manhattan", "braycurtis", "canberra", "minkowski", "lp", "chebyshev", "linf", "jensenshannon", "cosine", "correlation". Default: "euclidean". |
p |
Parameter for the Minkowski metric. If p = 1, then the metric is equivalent to manhattan distance (l1). If p = 2, the metric is equivalent to euclidean distance (l2). |
neighbors |
Number of nearest neighbors to query. Default: 5L. |
formula |
A formula specifying the outcome terms on the left-hand side, and the predictor terms on the right-hand side. |
data |
When a __recipe__ or __formula__ is used, |
Value
A KNN model that can be used with the 'predict' S3 generic to make predictions on new data points. The model object contains the following: - "knn_index": a GPU pointer to the KNN index. - "algo": enum value of the algorithm being used for the KNN query. - "metric": enum value of the distance metric used in KNN computations. - "p": parameter for the Minkowski metric. - "n_samples": number of input data points. - "n_dims": dimension of each input data point.
Examples
library(cuda.ml)
library(MASS)
library(magrittr)
library(purrr)
set.seed(0L)
centers <- list(c(3, 3), c(-3, -3), c(-3, 3))
gen_pts <- function(cluster_sz) {
pts <- centers %>%
map(~ mvrnorm(cluster_sz, mu = .x, Sigma = diag(2)))
rlang::exec(rbind, !!!pts) %>% as.matrix()
}
gen_labels <- function(cluster_sz) {
seq_along(centers) %>%
sapply(function(x) rep(x, cluster_sz)) %>%
factor()
}
sample_cluster_sz <- 1000
sample_pts <- cbind(
gen_pts(sample_cluster_sz) %>% as.data.frame(),
label = gen_labels(sample_cluster_sz)
)
model <- cuda_ml_knn(label ~ ., sample_pts, algo = "ivfflat", metric = "euclidean")
test_cluster_sz <- 10
test_pts <- gen_pts(test_cluster_sz) %>% as.data.frame()
predictions <- predict(model, test_pts)
print(predictions, n = 30)