dbscan {clustlearn}R Documentation

Density Based Spatial Clustering of Applications with Noise (DBSCAN)

Description

Perform DBSCAN clustering on a data matrix.

Usage

dbscan(data, eps, min_pts = 4, details = FALSE, waiting = TRUE, ...)

Arguments

data

a set of observations, presented as a matrix-like object where every row is a new observation.

eps

how close two observations have to be to be considered neighbors.

min_pts

the minimum amount of neighbors for a region to be considered dense.

details

a Boolean determining whether intermediate logs explaining how the algorithm works should be printed or not.

waiting

a Boolean determining whether the intermediate logs should be printed in chunks waiting for user input before printing the next or not.

...

additional arguments passed to proxy::dist().

Details

The data given by data is clustered by the DBSCAN method, which aims to partition the points into clusters such that the points in a cluster are close to each other and the points in different clusters are far away from each other. The clusters are defined as dense regions of points separated by regions of low density.

The DBSCAN method follows a 2 step process:

  1. For each point, the neighborhood of radius eps is computed. If the neighborhood contains at least min_pts points, then the point is considered a core point. Otherwise, the point is considered an outlier.

  2. For each core point, if the core point is not already assigned to a cluster, a new cluster is created and the core point is assigned to it. Then, the neighborhood of the core point is explored. If a point in the neighborhood is a core point, then the neighborhood of that point is also explored. This process is repeated until all points in the neighborhood have been explored. If a point in the neighborhood is not already assigned to a cluster, then it is assigned to the cluster of the core point.

Whatever points are not assigned to a cluster are considered outliers.

Value

A dbscan() object. It is a list with the following components:

cluster a vector of integers (from 0 to max(cl$cluster)) indicating the cluster to which each point belongs. Points in cluster number 0 are considered outliers.
eps the value of eps used.
min_pts the value of min_pts used.
size a vector with the number of data points belonging to each cluster (where the first element is the number of outliers).

Author(s)

Eduardo Ruiz Sabajanes, eduardo.ruizs@edu.uah.es

Examples

### Helper function
test <- function(db, eps) {
  print(cl <- clustlearn::dbscan(db, eps))
  out <- cl$cluster == 0
  plot(db[!out, ], col = cl$cluster[!out], pch = 20, asp = 1)
  points(db[out, ], col = max(cl$cluster) + 1, pch = 4, lwd = 2)
}

### Example 1
test(clustlearn::db1, 0.3)

### Example 2
test(clustlearn::db2, 0.3)

### Example 3
test(clustlearn::db3, 0.25)

### Example 4
test(clustlearn::db4, 0.2)

### Example 5
test(clustlearn::db5, 0.3)

### Example 6
test(clustlearn::db6, 0.3)

### Example 7 (with explanations, no plots)
  cl <- clustlearn::dbscan(
  clustlearn::db5[1:20, ],
  0.3,
  details = TRUE,
  waiting = FALSE
)


[Package clustlearn version 1.0.0 Index]