get_clustering_stats {scclust} | R Documentation |
Get clustering statistics
Description
get_clustering_stats
calculates statistics of a clustering.
Usage
get_clustering_stats(distances, clustering)
Arguments
distances |
a |
clustering |
a |
Details
The function reports the following measures:
num_data_points | total number of data points |
num_assigned | number of points assigned to a cluster |
num_clusters | number of clusters |
min_cluster_size | size of the smallest cluster |
max_cluster_size | size of the largest cluster |
avg_cluster_size | average cluster size |
sum_dists | sum of all within-cluster distances |
min_dist | smallest within-cluster distance |
max_dist | largest within-cluster distance |
avg_min_dist | average of the clusters' smallest distances |
avg_max_dist | average of the clusters' largest distances |
avg_dist_weighted | average of the clusters' average distances weighed by cluster size |
avg_dist_unweighted | average of the clusters' average distances (unweighed) |
Let d(i,j)
denote the distance between data points i
and j
. Let c
be a cluster containing the indices of points
assigned to the cluster. Let
D(c) = \{d(i,j): i,j \in c \wedge i>j\}
be a function returning all within-cluster distances in c
. Let
C
be a set containing all clusters.
sum_dists
is defined as:
\sum_{c\in C} sum(D(c))
min_dist
is defined as:
\min_{c\in C} \min(D(c))
max_dist
is defined as:
\max_{c\in C} \max(D(c))
avg_min_dist
is defined as:
\sum_{c\in C} \frac{\min(D(c))}{|C|}
avg_max_dist
is defined as:
\sum_{c\in C} \frac{\max(D(c))}{|C|}
Let:
AD(c) = \frac{sum(D(c))}{|D(c)|}
be the average within-cluster distance in cluster c
.
avg_dist_weighted
is defined as:
\sum_{c\in C} \frac{|c| AD(c)}{num_assigned}
where num_assigned
is the number of assigned data
points (see above).
avg_dist_unweighted
is defined as:
\sum_{c\in C} \frac{AD(c)}{|C|}
Value
Returns a list of class clustering_stats
containing the statistics.
Examples
my_data_points <- data.frame(x = c(0.1, 0.2, 0.3, 0.4, 0.5,
0.6, 0.7, 0.8, 0.9, 1.0),
y = c(10, 9, 8, 7, 6,
10, 9, 8, 7, 6))
my_distances <- distances(my_data_points)
my_scclust <- scclust(c("A", "A", "B", "C", "B",
"C", "C", "A", "B", "B"))
get_clustering_stats(my_distances, my_scclust)
# > Value
# > num_data_points 10.0000000
# > num_assigned 10.0000000
# > num_clusters 3.0000000
# > min_cluster_size 3.0000000
# > max_cluster_size 4.0000000
# > avg_cluster_size 3.3333333
# > sum_dists 18.2013097
# > min_dist 0.5000000
# > max_dist 3.0066593
# > avg_min_dist 0.8366584
# > avg_max_dist 2.4148611
# > avg_dist_weighted 1.5575594
# > avg_dist_unweighted 1.5847484