cluster_analysis {parameters} | R Documentation |
Cluster Analysis
Description
Compute hierarchical or kmeans cluster analysis and return the group assignment for each observation as vector.
Usage
cluster_analysis(
x,
n = NULL,
method = "kmeans",
include_factors = FALSE,
standardize = TRUE,
verbose = TRUE,
distance_method = "euclidean",
hclust_method = "complete",
kmeans_method = "Hartigan-Wong",
dbscan_eps = 15,
iterations = 100,
...
)
Arguments
x |
A data frame (with at least two variables), or a matrix (with at least two columns). |
n |
Number of clusters used for supervised cluster methods. If |
method |
Method for computing the cluster analysis. Can be |
include_factors |
Logical, if |
standardize |
Standardize the dataframe before clustering (default). |
verbose |
Toggle warnings and messages. |
distance_method |
Distance measure to be used for methods based on
distances (e.g., when |
hclust_method |
Agglomeration method to be used when |
kmeans_method |
Algorithm used for calculating kmeans cluster. Only applies,
if |
dbscan_eps |
The |
iterations |
The number of replications. |
... |
Arguments passed to or from other methods. |
Details
The print()
and plot()
methods show the (standardized) mean value for
each variable within each cluster. Thus, a higher absolute value indicates
that a certain variable characteristic is more pronounced within that
specific cluster (as compared to other cluster groups with lower absolute
mean values).
Clusters classification can be obtained via print(x, newdata = NULL, ...)
.
Value
The group classification for each observation as vector. The
returned vector includes missing values, so it has the same length
as nrow(x)
.
Note
There is also a plot()
-method
implemented in the see-package.
References
Maechler M, Rousseeuw P, Struyf A, Hubert M, Hornik K (2014) cluster: Cluster Analysis Basics and Extensions. R package.
See Also
-
n_clusters()
to determine the number of clusters to extract. -
cluster_discrimination()
to determine the accuracy of cluster group classification via linear discriminant analysis (LDA). -
performance::check_clusterstructure()
to check suitability of data for clustering. https://www.datanovia.com/en/lessons/
Examples
set.seed(33)
# K-Means ====================================================
rez <- cluster_analysis(iris[1:4], n = 3, method = "kmeans")
rez # Show results
predict(rez) # Get clusters
summary(rez) # Extract the centers values (can use 'plot()' on that)
if (requireNamespace("MASS", quietly = TRUE)) {
cluster_discrimination(rez) # Perform LDA
}
# Hierarchical k-means (more robust k-means)
if (require("factoextra", quietly = TRUE)) {
rez <- cluster_analysis(iris[1:4], n = 3, method = "hkmeans")
rez # Show results
predict(rez) # Get clusters
}
# Hierarchical Clustering (hclust) ===========================
rez <- cluster_analysis(iris[1:4], n = 3, method = "hclust")
rez # Show results
predict(rez) # Get clusters
# K-Medoids (pam) ============================================
if (require("cluster", quietly = TRUE)) {
rez <- cluster_analysis(iris[1:4], n = 3, method = "pam")
rez # Show results
predict(rez) # Get clusters
}
# PAM with automated number of clusters
if (require("fpc", quietly = TRUE)) {
rez <- cluster_analysis(iris[1:4], method = "pamk")
rez # Show results
predict(rez) # Get clusters
}
# DBSCAN ====================================================
if (require("dbscan", quietly = TRUE)) {
# Note that you can assimilate more outliers (cluster 0) to neighbouring
# clusters by setting borderPoints = TRUE.
rez <- cluster_analysis(iris[1:4], method = "dbscan", dbscan_eps = 1.45)
rez # Show results
predict(rez) # Get clusters
}
# Mixture ====================================================
if (require("mclust", quietly = TRUE)) {
library(mclust) # Needs the package to be loaded
rez <- cluster_analysis(iris[1:4], method = "mixture")
rez # Show results
predict(rez) # Get clusters
}