KMppIni {GMKMcharlie} | R Documentation |
Minkowski and spherical, deterministic and stochastic, multithreaded K-means++ initialization over dense representation of data
Description
Find suitable observations as initial centroids.
Usage
KMppIni(
X,
K,
firstSelection = 1L,
minkP = 2,
stochastic = FALSE,
seed = 123,
maxCore = 7L,
verbose = TRUE
)
Arguments
X |
A |
K |
An integer, the number of centroids. |
firstSelection |
An integer, index of the observation selected as the first initial centroid in |
minkP |
A numeric value or a character string. If numeric, |
stochastic |
A boolean value. |
seed |
Random seed if |
maxCore |
An integer. The maximal number of threads to invoke. No more than the total number of logical processors on machine. Default 7. |
verbose |
A boolean value. |
Details
In each iteration, the distances between the newly selected centroid and all the other observations are computed with multiple threads. Scheduling is homemade for minimizing the overhead of thread communication.
Value
An integer vector of size K
. The vector contains the indexes of observations selected as the initial centroids.
Examples
N = 30000L
d = 300L
K = 30L
X = matrix(rnorm(N * d) + 2, nrow = d)
# CRAN check allows examples invoking 2 threads at most. Change `maxCore`
# for acceleration.
kmppSt = KMppIni(X, K, firstSelection = 1L, minkP = 2,
stochastic = TRUE, seed = sample(1e9L, 1), maxCore = 2L)
kmppDt = KMppIni(X, K, firstSelection = 1L, minkP = 2,
stochastic = FALSE, maxCore = 2L)
str(kmppSt)
str(kmppDt)