dbscan {clustlearn} | R Documentation |
Density Based Spatial Clustering of Applications with Noise (DBSCAN)
Description
Perform DBSCAN clustering on a data matrix.
Usage
dbscan(data, eps, min_pts = 4, details = FALSE, waiting = TRUE, ...)
Arguments
data |
a set of observations, presented as a matrix-like object where every row is a new observation. |
eps |
how close two observations have to be to be considered neighbors. |
min_pts |
the minimum amount of neighbors for a region to be considered dense. |
details |
a Boolean determining whether intermediate logs explaining how the algorithm works should be printed or not. |
waiting |
a Boolean determining whether the intermediate logs should be printed in chunks waiting for user input before printing the next or not. |
... |
additional arguments passed to |
Details
The data given by data
is clustered by the DBSCAN method,
which aims to partition the points into clusters such that the points in a
cluster are close to each other and the points in different clusters are far
away from each other. The clusters are defined as dense regions of points
separated by regions of low density.
The DBSCAN method follows a 2 step process:
For each point, the neighborhood of radius
eps
is computed. If the neighborhood contains at leastmin_pts
points, then the point is considered a core point. Otherwise, the point is considered an outlier.For each core point, if the core point is not already assigned to a cluster, a new cluster is created and the core point is assigned to it. Then, the neighborhood of the core point is explored. If a point in the neighborhood is a core point, then the neighborhood of that point is also explored. This process is repeated until all points in the neighborhood have been explored. If a point in the neighborhood is not already assigned to a cluster, then it is assigned to the cluster of the core point.
Whatever points are not assigned to a cluster are considered outliers.
Value
A dbscan()
object. It is a list with the following
components:
cluster | a vector of integers (from 0 to max(cl$cluster) )
indicating the cluster to which each point belongs. Points in cluster number
0 are considered outliers. |
eps | the value of eps used. |
min_pts | the value of min_pts used. |
size | a vector with the number of data points belonging to each cluster (where the first element is the number of outliers). |
Author(s)
Eduardo Ruiz Sabajanes, eduardo.ruizs@edu.uah.es
Examples
### Helper function
test <- function(db, eps) {
print(cl <- clustlearn::dbscan(db, eps))
out <- cl$cluster == 0
plot(db[!out, ], col = cl$cluster[!out], pch = 20, asp = 1)
points(db[out, ], col = max(cl$cluster) + 1, pch = 4, lwd = 2)
}
### Example 1
test(clustlearn::db1, 0.3)
### Example 2
test(clustlearn::db2, 0.3)
### Example 3
test(clustlearn::db3, 0.25)
### Example 4
test(clustlearn::db4, 0.2)
### Example 5
test(clustlearn::db5, 0.3)
### Example 6
test(clustlearn::db6, 0.3)
### Example 7 (with explanations, no plots)
cl <- clustlearn::dbscan(
clustlearn::db5[1:20, ],
0.3,
details = TRUE,
waiting = FALSE
)