clusterabilitytest {clusterability}R Documentation

Perform a test of clusterability


Performs tests for clusterability of a data set and returns results in a clusterability object. Can do data reduction via PCA or pairwise distances and standardize data prior to performing the test.


clusterabilitytest(data, test, reduction = "pca",
  distance_metric = "euclidean", distance_standardize = "std",
  pca_center = TRUE, pca_scale = TRUE, is_dist_matrix = FALSE,
  completecase = FALSE, d_simulatepvalue = FALSE, d_reps = 2000,
  s_m = 999, s_adjust = TRUE, s_digits = 6, s_setseed = NULL,
  s_outseed = FALSE)



the data set to be used in the test. Must contain only numeric data.


the test to be performed. Either "dip" or "silverman". See 'Details' section below for how to pick a test.


any dimension reduction that is to be performed.

  • "none" performs no dimension reduction.

  • "pca" uses the scores from the first principal component.

  • "distance" computes pairwise distances (using distance_metric as the metric).

For multivariate data, dimension reduction is required.


if applicable, the metric to be used in computing pairwise distances.

The "euclidean" (default), "maximum", "manhattan", "canberra", "binary" choices work the same as in dist. The Minkowski metric is available by providing "minkowski(p)".

Additional choices are:

  • "sqeuc": squared Euclidean distances.

  • "cov": covariance similarity coefficient,

  • "corr": correlation similarity coefficient

  • "sqcorr": squared correlation similarity coefficient.

CAUTION: Not all of these have been tested, but instead are provided to potentially be useful. If in doubt, use the default "euclidean".


how the variables should be standardized, if at all.

  • "none": no standardization is performed

  • "std" (default) each variable standardized to have mean 0 and standard deviation 1

  • "mean": each variable standardized to have mean 0 (standard deviation is unchanged)

  • "median": each variable standardized to have median 0 (standard deviation is unchanged)


if applicable, a logical value indicating whether the variables should be shifted to be zero centered (see prcomp for more details). Default is TRUE.


if applicable, a logical value indicating whether the variables should be scaled to have unit variance before the analysis takes place (see prcomp for details). Default is TRUE.


a logical value indicating whether the data argument is a distance matrix. If TRUE then the lower triangular portion of data will be extracted and be used in the multimodality test.


a logical value indicating whether a complete case analysis should be performed. For both tests, missing data must be removed before the test can be performed. This can be done manually by the user or by setting completecase = TRUE.


for Dip Test, a logical value indicating whether p~values should be obtained via Monte Carlo simulation (see dip.test for details).


for Dip Test, a positive integer. The number of replicates used in Monte Carlo simulation. Only used if d_simulatepvalue is TRUE.


for Silverman Test, a positive integer. The number of bootstrap replicates used in the test. Default is 999.


for Silverman Test, a logical value indicating whether p-values are adjusted using work by Hall and York.


for Silverman Test, a positive integer indicating the number of digits to round the p value. Default is 6 and is only used when s_adjust = TRUE.


for Silverman Test, an integer used to set the seed of the random number generator. If the default value of NULL is used, then no seed will be set.


for Silverman Test, a logical value indicating whether to return the state of the random number generator as part of the output. This is used in limited cases for troubleshooting, so the default is FALSE.


clusterabilitytest returns a clusterability object containing information on the test performed and results. Can be printed using the print.clusterability function.


Hall, P. and York, M., 2001. On the calibration of Silverman's test for multimodality. Statistica Sinica, pp.515-536.

Silverman, B.W., 1981. Using kernel density estimates to investigate multimodality. Journal of the Royal Statistical Society. Series B (Methodological), pp.97-99.

Martin Maechler (2016). diptest: Hartigan's Dip Test Statistic for Unimodality - Corrected. R package version 0.75-7.

Schwaiger F, Holzmann H. Package which implements the silvermantest; 2013. Available from: packages/.

See Also



### Quick start ###
# Load data and remove Species
iris_num <- iris[,-5]

# Run test using default options
clust_result <- clusterabilitytest(iris_num, "dip")

# Print results

### Longer Example: Specifying Parameters ###
# Load data and plot to visualize

# Using Silverman's test, pairwise distances to reduce dimension,
# 1,000 bootstrap replicates, with an RNG seed of 12345
clust_result2 <- clusterabilitytest(normals2, "silverman", reduction = "distance",
     s_m = 1000, s_setseed = 12345)

# Print result

[Package clusterability version Index]