R: (Weighted) density-based clustering

DBSCANClustering {sharp}

R Documentation

(Weighted) density-based clustering

Description

Runs Density-Based Spatial Clustering of Applications with Noise (DBSCAN) clustering using implementation from dbscan. This is also known as the k-medoids algorithm. If Lambda is provided, clustering is applied on the weighted distance matrix calculated using the COSA algorithm as implemented in cosa2. Otherwise, distances are calculated using dist. This function is not using stability.

Usage

DBSCANClustering(
  xdata,
  nc = NULL,
  eps = NULL,
  Lambda = NULL,
  distance = "euclidean",
  ...
)

Arguments

`xdata`	data matrix with observations as rows and variables as columns.
`nc`	matrix of parameters controlling the number of clusters in the underlying algorithm specified in `implementation`. If `nc` is not provided, it is set to `seq(1, tau*nrow(xdata))`.
`eps`	radius in density-based clustering, see `dbscan`.
`Lambda`	vector of penalty parameters (see argument `lambda` in `cosa2`). Unweighted distance matrices are used if `Lambda=NULL`.
`distance`	character string indicating the type of distance to use. If `Lambda=NULL`, possible values include `"euclidean"`, `"maximum"`, `"canberra"`, `"binary"`, and `"minkowski"` (see argument `method` in `dist`). Otherwise, possible values include `"euclidean"` (`pwr=2`) or `"absolute"` (`pwr=1`) (see argument `pwr` in `cosa2`).
`...`	additional parameters passed to `dbscan` (except for `minPts` which is fixed to `2`), `dist`, or `cosa2`. If `weighted=TRUE`, parameters `niter` (default to 1) and `noit` (default to 100) correspond to the number of iterations in `cosa2` to calculate weights and may need to be modified.

Value

A list with:

`comembership`	an array of binary and symmetric co-membership matrices.
`weights`	a matrix of median weights by feature.

References

Kampert MM, Meulman JJ, Friedman JH (2017). “rCOSA: A Software Package for Clustering Objects on Subsets of Attributes.” Journal of Classification, 34(3), 514–547. doi:10.1007/s00357-017-9240-z.

Friedman JH, Meulman JJ (2004). “Clustering objects on subsets of attributes (with discussion).” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 66(4), 815-849. doi:10.1111/j.1467-9868.2004.02059.x, https://rss.onlinelibrary.wiley.com/doi/pdf/10.1111/j.1467-9868.2004.02059.x, https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/j.1467-9868.2004.02059.x.

Examples

if (requireNamespace("dbscan", quietly = TRUE)) {
  # Data simulation
  set.seed(1)
  simul <- SimulateClustering(n = c(10, 10), pk = 50)
  plot(simul)

  # DBSCAN clustering
  myclust <- DBSCANClustering(
    xdata = simul$data,
    eps = seq(0, 2 * sqrt(ncol(simul$data) - 1), by = 0.1)
  )

  # Weighted PAM clustering (using COSA)
  if (requireNamespace("rCOSA", quietly = TRUE)) {
    myclust <- DBSCANClustering(
      xdata = simul$data,
      eps = c(0.25, 0.5, 0.75),
      Lambda = c(0.2, 0.5)
    )
  }
}

[Package sharp version 1.4.6 Index]