h2o4gpu.kmeans {h2o4gpu} | R Documentation |
K-means Clustering
Description
K-means Clustering
Usage
h2o4gpu.kmeans(n_clusters = 8L, init = "k-means++", n_init = 1L,
max_iter = 300L, tol = 1e-04, precompute_distances = "auto",
verbose = 0L, random_state = NULL, copy_x = TRUE, n_jobs = 1L,
algorithm = "auto", gpu_id = 0L, n_gpus = -1L, do_checks = 1L,
backend = "h2o4gpu")
Arguments
n_clusters |
The number of clusters to form as well as the number of centroids to generate. |
init |
Method for initialization, defaults to 'random': 'k-means++' : selects initial cluster centers for k-mean clustering in a smart way to speed up convergence. Not supported yet - if chosen we will use SKLearn's methods. 'random': choose k observations (rows) at random from data for the initial centroids. If an ndarray is passed, it should be of shape (n_clusters, n_features) and gives the initial centers. Not supported yet - if chosen we will use SKLearn's methods. |
n_init |
Number of time the k-means algorithm will be run with different centroid seeds. The final results will be the best output of n_init consecutive runs in terms of inertia. Not supported yet - always runs 1. |
max_iter |
Maximum number of iterations of the algorithm. |
tol |
Relative tolerance to declare convergence. |
precompute_distances |
Precompute distances (faster but takes more memory). 'auto' : do not precompute distances if n_samples * n_clusters > 12 million. This corresponds to about 100MB overhead per job using double precision. TRUE : always precompute distances FALSE : never precompute distances Not supported yet - always uses auto if running h2o4gpu version. |
verbose |
Logger verbosity level. |
random_state |
random_state for RandomState. Must be convertible to 32 bit unsigned integers. |
copy_x |
When pre-computing distances it is more numerically accurate to center the data first. If copy_x is TRUE, then the original data is not modified. If FALSE, the original data is modified, and put back before the function returns, but small numerical differences may be introduced by subtracting and then adding the data mean. Not supported yet - always uses TRUE if running h2o4gpu version. |
n_jobs |
The number of jobs to use for the computation. This works by computing each of the n_init runs in parallel. If -1 all CPUs are used. If 1 is given, no parallel computing code is used at all, which is useful for debugging. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one are used. Not supported yet - CPU backend not yet implemented. |
algorithm |
K-means algorithm to use. The classical EM-style algorithm is "full". The "elkan" variation is more efficient by using the triangle inequality, but currently doesn't support sparse data. "auto" chooses "elkan" for dense data and "full" for sparse data. Not supported yet - always uses full if running h2o4gpu version. |
gpu_id |
ID of the GPU on which the algorithm should run. |
n_gpus |
Number of GPUs on which the algorithm should run. < 0 means all possible GPUs on the machine. 0 means no GPUs, run on CPU. |
do_checks |
If set to 0 GPU error check will not be performed. |
backend |
Which backend to use. Options are 'auto', 'sklearn', 'h2o4gpu'. Saves as attribute for actual backend used. |