densityClust {densityClust} | R Documentation |
Calculate clustering attributes based on the densityClust algorithm
Description
This function takes a distance matrix and optionally a distance cutoff and
calculates the values necessary for clustering based on the algorithm
proposed by Alex Rodrigues and Alessandro Laio (see references). The actual
assignment to clusters are done in a later step, based on user defined
threshold values. If a distance matrix is passed into distance
the
original algorithm described in the paper is used. If a matrix or data.frame
is passed instead it is interpretted as point coordinates and rho will be
estimated based on k-nearest neighbors of each point (rho is estimated as
exp(-mean(x))
where x
is the distance to the nearest
neighbors). This can be useful when data is so large that calculating the
full distance matrix can be prohibitive.
Usage
densityClust(distance, dc, gaussian = FALSE, verbose = FALSE, ...)
Arguments
distance |
A distance matrix or a matrix (or data.frame) for the coordinates of the data. If a matrix or data.frame is used the distances and local density will be estimated using a fast k-nearest neighbor approach. |
dc |
A distance cutoff for calculating the local density. If missing it
will be estimated with |
gaussian |
Logical. Should a gaussian kernel be used to estimate the density (defaults to FALSE) |
verbose |
Logical. Should the running details be reported |
... |
Additional parameters passed on to get.knn |
Details
The function calculates rho and delta for the observations in the provided
distance matrix. If a distance cutoff is not provided this is first estimated
using estimateDc()
with default values.
The information kept in the densityCluster object is:
rho
A vector of local density values
delta
A vector of minimum distances to observations of higher density
distance
The initial distance matrix
dc
The distance cutoff used to calculate rho
threshold
A named vector specifying the threshold values for rho and delta used for cluster detection
peaks
A vector of indexes specifying the cluster center for each cluster
clusters
A vector of cluster affiliations for each observation. The clusters are referenced as indexes in the peaks vector
halo
A logical vector specifying for each observation if it is considered part of the halo
knn_graph
kNN graph constructed. It is only applicable to the case where coordinates are used as input. Currently it is set as NA.
nearest_higher_density_neighbor
index for the nearest sample with higher density. It is only applicable to the case where coordinates are used as input.
nn.index
indices for each cell's k-nearest neighbors. It is only applicable for the case where coordinates are used as input.
nn.dist
distance to each cell's k-nearest neighbors. It is only applicable for the case where coordinates are used as input.
Before running findClusters the threshold, peaks, clusters and halo data is
NA
.
Value
A densityCluster object. See details for a description.
References
Rodriguez, A., & Laio, A. (2014). Clustering by fast search and find of density peaks. Science, 344(6191), 1492-1496. doi:10.1126/science.1242072
See Also
Examples
irisDist <- dist(iris[,1:4])
irisClust <- densityClust(irisDist, gaussian=TRUE)
plot(irisClust) # Inspect clustering attributes to define thresholds
irisClust <- findClusters(irisClust, rho=2, delta=2)
plotMDS(irisClust)
split(iris[,5], irisClust$clusters)