| dbscan {fpc} | R Documentation | 
DBSCAN density reachability and connectivity clustering
Description
Generates a density based clustering of arbitrary shape as introduced in Ester et al. (1996).
Usage
  dbscan(data, eps, MinPts = 5, scale = FALSE, method = c("hybrid", "raw",
    "dist"), seeds = TRUE, showplot = FALSE, countmode = NULL)
  ## S3 method for class 'dbscan'
print(x, ...)
  ## S3 method for class 'dbscan'
plot(x, data, ...)
  ## S3 method for class 'dbscan'
predict(object, data, newdata = NULL,
predict.max=1000, ...)
Arguments
| data | data matrix, data.frame, dissimilarity matrix or
 | 
| eps | Reachability distance, see Ester et al. (1996). | 
| MinPts | Reachability minimum no. of points, see Ester et al. (1996). | 
| scale |  scale the data if  | 
| method | "dist" treats data as distance matrix (relatively fast but memory expensive), "raw" treats data as raw data and avoids calculating a distance matrix (saves memory but may be slow), "hybrid" expects also raw data, but calculates partial distance matrices (very fast with moderate memory requirements). | 
| seeds | FALSE to not include the  | 
| showplot | 0 = no plot, 1 = plot per iteration, 2 = plot per subiteration. | 
| countmode | NULL or vector of point numbers at which to report progress. | 
| x | object of class  | 
| object | object of class  | 
| newdata | matrix or data.frame with raw data to predict. | 
| predict.max | max. batch size for predictions. | 
| ... | Further arguments transferred to plot methods. | 
Details
Clusters require a minimum no of points (MinPts) within a maximum distance (eps) around one of its members (the seed). Any point within eps around any point which satisfies the seed condition is a cluster member (recursively). Some points may not belong to any clusters (noise).
We have clustered a 100.000 x 2 dataset in 40 minutes on a Pentium M 1600 MHz.
print.dbscan shows a statistic of the number of points
belonging to the clusters that are seeds and border points.
plot.dbscan distinguishes between seed and border points by
plot symbol.
Value
predict.dbscan gives out a vector of predicted clusters for the
points in newdata.
dbscan gives out 
an object of class 'dbscan' which is a LIST with components
| cluster | integer vector coding cluster membership with noise observations (singletons) coded as 0 | 
| isseed | logical vector indicating whether a point is a seed (not border, not noise) | 
| eps | parameter eps | 
| MinPts | parameter MinPts | 
Note
 this is a simplified version of the original algorithm (no K-D-trees
used), thus we have o(n^2) instead of o(n*log(n)) 
Author(s)
Jens Oehlschlaegel, based on a draft by Christian Hennig.
References
Martin Ester, Hans-Peter Kriegel, Joerg Sander, Xiaowei Xu (1996). A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. Institute for Computer Science, University of Munich. Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining (KDD-96).
Examples
  set.seed(665544)
  n <- 600
  x <- cbind(runif(10, 0, 10)+rnorm(n, sd=0.2), runif(10, 0, 10)+rnorm(n,
    sd=0.2))
  par(bg="grey40")
  ds <- dbscan(x, 0.2)
# run with showplot=1 to see how dbscan works.
  ds
  plot(ds, x)
  x2 <- matrix(0,nrow=4,ncol=2)
  x2[1,] <- c(5,2)
  x2[2,] <- c(8,3)
  x2[3,] <- c(4,4)
  x2[4,] <- c(9,9)
  predict(ds, x, x2)
  n <- 600
  x <- cbind((1:3)+rnorm(n, sd=0.2), (1:3)+rnorm(n, sd=0.2))
# Not run, but results from my machine are 0.105 - 0.068 - 0.255:
#  system.time(ds <- dbscan(x, 0.3, countmode=NULL, method="raw"))[3] 
#  system.time(dsb <- dbscan(x, 0.3, countmode=NULL, method="hybrid"))[3]
#  system.time(dsc <- dbscan(dist(x), 0.3, countmode=NULL,
#    method="dist"))[3]