dbscan {mlpack}R Documentation

DBSCAN clustering

Description

An implementation of DBSCAN clustering. Given a dataset, this can compute and return a clustering of that dataset.

Usage

dbscan(
  input,
  epsilon = NA,
  min_size = NA,
  naive = FALSE,
  selection_type = NA,
  single_mode = FALSE,
  tree_type = NA,
  verbose = getOption("mlpack.verbose", FALSE)
)

Arguments

input

Input dataset to cluster (numeric matrix).

epsilon

Radius of each range search. Default value "1" (numeric).

min_size

Minimum number of points for a cluster. Default value "5" (integer).

naive

If set, brute-force range search (not tree-based) will be used. Default value "FALSE" (logical).

selection_type

If using point selection policy, the type of selection to use ('ordered', 'random'). Default value "ordered" (character).

single_mode

If set, single-tree range search (not dual-tree) will be used. Default value "FALSE" (logical).

tree_type

If using single-tree or dual-tree search, the type of tree to use ('kd', 'r', 'r-star', 'x', 'hilbert-r', 'r-plus', 'r-plus-plus', 'cover', 'ball'). Default value "kd" (character).

verbose

Display informational messages and the full list of parameters and timers at the end of execution. Default value "getOption("mlpack.verbose", FALSE)" (logical).

Details

This program implements the DBSCAN algorithm for clustering using accelerated tree-based range search. The type of tree that is used may be parameterized, or brute-force range search may also be used.

The input dataset to be clustered may be specified with the "input" parameter; the radius of each range search may be specified with the "epsilon" parameters, and the minimum number of points in a cluster may be specified with the "min_size" parameter.

The "assignments" and "centroids" output parameters may be used to save the output of the clustering. "assignments" contains the cluster assignments of each point, and "centroids" contains the centroids of each cluster.

The range search may be controlled with the "tree_type", "single_mode", and "naive" parameters. "tree_type" can control the type of tree used for range search; this can take a variety of values: 'kd', 'r', 'r-star', 'x', 'hilbert-r', 'r-plus', 'r-plus-plus', 'cover', 'ball'. The "single_mode" parameter will force single-tree search (as opposed to the default dual-tree search), and '"naive" will force brute-force range search.

Value

A list with several components:

assignments

Output matrix for assignments of each point (integer row).

centroids

Matrix to save output centroids to (numeric matrix).

Author(s)

mlpack developers

Examples

# An example usage to run DBSCAN on the dataset in "input" with a radius of
# 0.5 and a minimum cluster size of 5 is given below:

## Not run: 
dbscan(input=input, epsilon=0.5, min_size=5)

## End(Not run)

[Package mlpack version 4.4.0 Index]