CalculateSilhouette {scellpam}R Documentation

CalculateSilhouette

Description

Calculates the silhouette of each point of those classified by a clustering algorithm.

Usage

CalculateSilhouette(cl, fdist, nthreads = 0L)

Arguments

cl

The array of classification with the number of the class to which each point belongs to. This number must be in 1..number_of_classes.
This function takes something like the L$clasif array which is the second element of the list returned by ApplyPAM

fdist

The binary file containing the symmetric matrix with the dissimilarities between cells (usually, generated by a call to CalcAndWriteDissimilarityMatrix)

nthreads

The number of used threads for parallel calculation.
-1 means don't use threads (serial implementation).
0 means let the program choose according to the number of cores and of points.
Any other number forces this number of threads. Choosing more than the number of available cores is allowed, but discouraged.
Default: 0

Value

sil Numeric vector with the values of the silhouette for each point, in the same order in which points are in cl.
If cl is a named vector sil will be a named vector, too, with the same names.

Examples

# Synthetic problem: 10 random seeds with coordinates in [0..20]
# to which random values in [-0.1..0.1] are added
M<-matrix(0,100,500)
rownames(M)<-paste0("rn",c(1:100))
for (i in (1:10))
{
 p<-20*runif(500)
 Rf <- matrix(0.2*(runif(5000)-0.5),nrow=10)
 for (k in (1:10))
 {
  M[10*(i-1)+k,]=p+Rf[k,]
 }
}
tmpfile1=paste0(tempdir(),"/pamtest.bin")
JWriteBin(M,tmpfile1,dtype="float",dmtype="full")
tmpdisfile1=paste0(tempdir(),"/pamDL2.bin")
CalcAndWriteDissimilarityMatrix(tmpfile1,tmpdisfile1,distype="L2",restype="float",nthreads=0)
L <- ApplyPAM(tmpdisfile1,10,init_method="BUILD")
sil <- CalculateSilhouette(L$clasif,tmpdisfile1)
# Histogram of the silhouette. In this synthetic problem, almost 1 for all points
hist(sil)

[Package scellpam version 1.4.5 Index]