get_sil_widths {EvoPhylo} | R Documentation |
Calculate silhouette widths index for various numbers of partitions
Description
Computes silhouette widths index for several possible numbers of clusters(partitions) k
, which determines how well an object falls within their cluster compared to other clusters. The best number of clusters k
is the one with the highest silhouette width.
Usage
get_sil_widths(dist_mat, max.k = 10)
## S3 method for class 'sil_width_df'
plot(x, ...)
Arguments
dist_mat |
A Gower distance matrix, the output of a call to |
max.k |
The maximum number of clusters(partitions) to search across. |
x |
A |
... |
Further arguments passed to |
Details
get_sil_widths
calls cluster::pam
on the supplied Gower distance matrix with each number of clusters (partitions) up to max.k
and stores the average silhouette widths across the clustered characters. When plot = TRUE
, a plot of the sillhouette widths against the number of clusters is produced, though this can also be produced seperately on the resulting data frame using plot.sil_width_df()
. The number of clusters with the greatest silhouette width should be selected for use in the final clustering specification.
Value
For get_sil_widths()
, it produces a data frame, inheriting from class "sil_width_df"
, with two columns: k
is the number of clusters, and sil_width
is the silhouette widths for each number of clusters. If plot = TRUE
, the output is returned invisibly.
For plot()
on a get_sil_widths()
object, it produces a ggplot
object that can be manipulated using ggplot2 syntax (e.g., to change the theme
or labels).
See Also
vignette("char-part")
for the use of this function as part of an analysis pipeline.
Examples
# See vignette("char-part") for how to use this
# function as part of an analysis pipeline
data("characters")
#Reading example file as categorical data
Dmatrix <- get_gower_dist(characters)
#Get silhouette widths for k=7
sw <- get_sil_widths(Dmatrix, max.k = 7)
sw
plot(sw, color = "red", size =2)