clusterStability {Evacluster}R Documentation

clustering stability function

Description

This function computes the stability of clustering that helps to select the best number of clusters. Feature selection and dimensionality reduction methods can be used before clustering the data.

Usage

clusterStability(
  data = NULL,
  clustermethod = NULL,
  dimenreducmethod = NULL,
  n_components = 3,
  perplexity = 25,
  max_iter = 1000,
  k_neighbor = 3,
  featureselection = NULL,
  outcome = NULL,
  fs.pvalue = 0.05,
  randomTests = 20,
  trainFraction = 0.5,
  pac.thr = 0.1,
  ...
)

Arguments

data

A Data set

clustermethod

The clustering method. This can be one of "Mclust","pamCluster","kmeansCluster", "hierarchicalCluster",and "FuzzyCluster".

dimenreducmethod

The dimensionality reduction method. This must be one of "UMAP","tSNE", and "PCA".

n_components

The dimension of the space that data embed into. It can be set to any integer value in the range of 2 to 100.

perplexity

The Perplexity parameter that determines the optimal number of neighbors in tSNE method.(it is only used in the tSNE reduction method)

max_iter

The maximum number of iterations for performing tSNE reduction method.

k_neighbor

The k_neighbor is used for computing the means of #neighbors with min distance (#Neighbor=sqrt(#Samples/k) for performing an embedding of new data using an existing embedding in the tSNE method.

featureselection

This parameter determines whether feature selection is applied before clustering data or not. if used, it should be "yes", otherwisw "no".

outcome

The outcome feature is used for feature selection.

fs.pvalue

The threshold pvalue used for feature selection process. The default value is 0.05.

randomTests

The number of iterations of the clustering process for computing the cluster stability.

trainFraction

This parameter determines the ratio of training data. The default value is 0.5.

pac.thr

The pac.thr is the thresold to use for computing the proportion of ambiguous clustering (PAC) score. It is as the fraction of sample pairs with consensus indices falling in the interval.The default value is 0.1.

...

Additional arguments passed to clusterStability().

Value

A list with the following elements:

Examples


library("mlbench")
data(Sonar)

Sonar$Class <- as.numeric(Sonar$Class)
Sonar$Class[Sonar$Class == 1] <- 0 
Sonar$Class[Sonar$Class == 2] <- 1

ClustStab <- clusterStability(data=Sonar, clustermethod=kmeansCluster, dimenreducmethod="UMAP",
                              n_components = 3,featureselection="yes", outcome="Class",
                              fs.pvalue = 0.05,randomTests = 100,trainFraction = 0.7,center=3)


ClustStab <- clusterStability(data=Sonar, clustermethod=pamCluster, dimenreducmethod="tSNE",
                              n_components = 3, perplexity=10,max_iter=100,k_neighbor=2,
                              featureselection="yes", outcome="Class",fs.pvalue = 0.05,
                              randomTests = 100,trainFraction = 0.7,k=3)


ClustStab <- clusterStability(data=Sonar, clustermethod=hierarchicalCluster, 
                              dimenreducmethod="PCA", n_components = 3,featureselection="no",
                              randomTests = 100,trainFraction = 0.7,distmethod="euclidean",
                              clusters=3)



[Package Evacluster version 0.1.0 Index]