wcClusterQuality {WeightedCluster} | R Documentation |
Cluster quality statistics
Description
Compute several quality statistics of a given clustering solution.
Usage
wcClusterQuality(diss, clustering, weights = NULL)
Arguments
diss |
A dissimilarity matrix or a dist object (see |
clustering |
Factor. A vector of clustering membership. |
weights |
optional numerical vector containing weights. |
Details
Compute several quality statistics of a given clustering solution. See value for details.
Value
A list with two elements stats
and ASW
:
stats
with the following statistics:
- PBC
Point Biserial Correlation. Correlation between the given distance matrice and a distance which equal to zero for individuals in the same cluster and one otherwise.
- HG
Hubert's Gamma. Same as previous but using Kendall's Gamma coefficient.
- HGSD
Hubert's Gamma (Somers'D). Same as previous but using Somers' D coefficient.
- ASW
Average Silhouette width (observation).
- ASWw
Average Silhouette width (weighted).
- CH
Calinski-Harabasz index (Pseudo F statistics computed from distances).
- R2
Share of the discrepancy explained by the clustering solution.
- CHsq
Calinski-Harabasz index (Pseudo F statistics computed from squared distances).
- R2sq
Share of the discrepancy explained by the clustering solution (computed using squared distances).
- HC
Hubert's C coefficient.
ASW
:The Average Silhouette Width of each cluster, one column for each ASW measure.
Examples
data(mvad)
## Aggregating state sequence
aggMvad <- wcAggregateCases(mvad[, 17:86], weights=mvad$weight)
## Creating state sequence object
mvad.seq <- seqdef(mvad[aggMvad$aggIndex, 17:86], weights=aggMvad$aggWeights)
## Computing Hamming distance between sequence
diss <- seqdist(mvad.seq, method="HAM")
## KMedoids using PAMonce method (clustering only)
clust5 <- wcKMedoids(diss, k=5, weights=aggMvad$aggWeights, cluster.only=TRUE)
## Compute the silhouette of each observation
qual <- wcClusterQuality(diss, clust5, weights=aggMvad$aggWeights)
print(qual)