cDistance {cstab}R Documentation

Selection of number of clusters via distance-based measures


Selection of number of clusters via gap statistic, jump statistic, and slope statistic


cDistance(data, kseq, method = "kmeans", linkage = "complete",
  kmIter = 10, gapIter = 10)



a n x p data matrix of type numeric.


a vector with considered numbers clusters k > 1


character string indicating the clustering algorithm. 'kmeans' for the k-means algorithm, 'hierarchical' for hierarchical clustering.


character specifying the linkage criterion, in case type='hierarchical'. The available options are "single", "complete", "average", "mcquitty", "ward.D", "ward.D2", "centroid" or "median". See hclust.


integer specifying the the number of restarts of the k-means algorithm in order to avoid local minima.


integer specifying the number of simulated datasets to compute the gap statistic (see Tibshirani et al., 2001).


a list with the optimal numbers of cluster determined by the gap statistic (Tibshirani et al., 2001), the jump Statistic (Sugar & James, 2011) and the slope statistic (Fujita et al., 2014). Along the function returns the gap, jump and slope for each k in kseq.


Dirk U. Wulff <> Jonas M. B. Haslbeck <>


Tibshirani, R., Walther, G., & Hastie, T. (2001). Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63(2), 411-423.

Sugar, C. A., & James, G. M. (2011). Finding the number of clusters in a dataset. Journal of the American Statistical Association, 98(463), 750-763,

Fujita, A., Takahashi, D. Y., & Patriota, A. G. (2014). A non-parametric method to estimate the number of clusters. Computational Statistics & Data Analysis, 73, 27-39.


## Not run: 
  # Generate Data from Gaussian Mixture
  s <- .1
  n <- 50
  data <- rbind(cbind(rnorm(n, 0, s), rnorm(n, 0, s)),
                cbind(rnorm(n, 1, s), rnorm(n, 1, s)),
                cbind(rnorm(n, 0, s), rnorm(n, 1, s)),
                cbind(rnorm(n, 1, s), rnorm(n, 0, s)))

 # Selection of Number of Clusters using Distance-based Measures
 cDistance(data, kseq=2:10)
## End(Not run)

[Package cstab version 0.2-2 Index]