| TGL_kmeans {tglkmeans} | R Documentation | 
kmeans++ with return value similar to R kmeans
Description
kmeans++ with return value similar to R kmeans
Usage
TGL_kmeans(
  df,
  k,
  metric = "euclid",
  max_iter = 40,
  min_delta = 0.0001,
  verbose = FALSE,
  keep_log = FALSE,
  id_column = FALSE,
  reorder_func = "hclust",
  hclust_intra_clusters = FALSE,
  seed = NULL,
  parallel = getOption("tglkmeans.parallel"),
  use_cpp_random = FALSE
)
Arguments
df | 
 a data frame or a matrix. Each row is a single observation and each column is a dimension. the first column can contain id for each observation (if id_column is TRUE), otherwise the rownames are used.  | 
k | 
 number of clusters. Note that in some cases the algorithm might return less clusters than k.  | 
metric | 
 distance metric for kmeans++ seeding. can be 'euclid', 'pearson' or 'spearman'  | 
max_iter | 
 maximal number of iterations  | 
min_delta | 
 minimal change in assignments (fraction out of all observations) to continue iterating  | 
verbose | 
 display algorithm messages  | 
keep_log | 
 keep algorithm messages in 'log' field  | 
id_column | 
 
  | 
reorder_func | 
 function to reorder the clusters. operates on each center and orders by the result. e.g.   | 
hclust_intra_clusters | 
 run hierarchical clustering within each cluster and return an ordering of the observations.  | 
seed | 
 seed for the c++ random number generator  | 
parallel | 
 cluster every cluster parallelly (if hclust_intra_clusters is true)  | 
use_cpp_random | 
 use c++ random number generator instead of R's. This should be used for only for backwards compatibility, as from version 0.4.0 onwards the default random number generator was changed o R.  | 
Value
list with the following components:
- cluster:
 A vector of integers (from â1:kâ) indicating the cluster to which each point is allocated.
- centers:
 A matrix of cluster centers.
- size:
 The number of points in each cluster.
- log:
 messages from the algorithm run (only if
id_column == TRUE).- order:
 A vector of integers with the new ordering if the observations. (only if hclust_intra_clusters = TRUE)
See Also
Examples
# create 5 clusters normally distributed around 1:5
d <- simulate_data(
    n = 100,
    sd = 0.3,
    nclust = 5,
    dims = 2,
    add_true_clust = FALSE,
    id_column = FALSE
)
head(d)
# cluster
km <- TGL_kmeans(d, k = 5, "euclid", verbose = TRUE)
names(km)
km$centers
head(km$cluster)
km$size