optics {dbscan}  R Documentation 
Ordering Points to Identify the Clustering Structure (OPTICS)
Description
Implementation of the OPTICS (Ordering points to identify the clustering structure) point ordering algorithm using a kdtree.
Usage
optics(x, eps = NULL, minPts = 5, ...)
## S3 method for class 'optics'
print(x, ...)
## S3 method for class 'optics'
plot(x, cluster = TRUE, predecessor = FALSE, ...)
## S3 method for class 'optics'
as.reachability(object, ...)
## S3 method for class 'optics'
as.dendrogram(object, ...)
extractDBSCAN(object, eps_cl)
extractXi(object, xi, minimum = FALSE, correctPredecessors = TRUE)
## S3 method for class 'optics'
predict(object, newdata, data, ...)
Arguments
x 
a data matrix or a dist object. 
eps 
upper limit of the size of the epsilon neighborhood. Limiting the neighborhood size improves performance and has no or very little impact on the ordering as long as it is not set too low. If not specified, the largest minPtsdistance in the data set is used which gives the same result as infinity. 
minPts 
the parameter is used to identify dense neighborhoods and the reachability distance is calculated as the distance to the minPts nearest neighbor. Controls the smoothness of the reachability distribution. Default is 5 points. 
... 
additional arguments are passed on to fixedradius nearest
neighbor search algorithm. See 
cluster , predecessor 
plot clusters and predecessors. 
object 
clustering object. 
eps_cl 
Threshold to identify clusters ( 
xi 
Steepness threshold to identify clusters hierarchically using the Xi method. 
minimum 
logical, representing whether or not to extract the minimal (nonoverlapping) clusters in the Xi clustering algorithm. 
correctPredecessors 
logical, correct a common artifact by pruning the steep up area for points that have predecessors not in the cluster–found by the ELKI framework, see details below. 
newdata 
new data points for which the cluster membership should be predicted. 
data 
the data set used to create the clustering object. 
Details
The algorithm
This implementation of OPTICS implements the original
algorithm as described by Ankerst et al (1999). OPTICS is an ordering
algorithm with methods to extract a clustering from the ordering.
While using similar concepts as DBSCAN, for OPTICS eps
is only an upper limit for the neighborhood size used to reduce
computational complexity. Note that minPts
in OPTICS has a different
effect then in DBSCAN. It is used to define dense neighborhoods, but since
eps
is typically set rather high, this does not effect the ordering
much. However, it is also used to calculate the reachability distance and
larger values will make the reachability distance plot smoother.
OPTICS linearly orders the data points such that points which are spatially
closest become neighbors in the ordering. The closest analog to this
ordering is dendrogram in singlelink hierarchical clustering. The algorithm
also calculates the reachability distance for each point.
plot()
(see reachability_plot)
produces a reachability plot which shows each points reachability distance
between two consecutive points
where the points are sorted by OPTICS. Valleys represent clusters (the
deeper the valley, the more dense the cluster) and high points indicate
points between clusters.
Specifying the data
If x
is specified as a data matrix, then Euclidean distances and fast
nearest neighbor lookup using a kdtree are used. See kNN()
for
details on the parameters for the kdtree.
Extracting a clustering
Several methods to extract a clustering from the order returned by OPTICS are implemented:

extractDBSCAN()
extracts a clustering from an OPTICS ordering that is similar to what DBSCAN would produce with an eps set toeps_cl
(see Ankerst et al, 1999). The only difference to a DBSCAN clustering is that OPTICS is not able to assign some border points and reports them instead as noise. 
extractXi()
extract clusters hierarchically specified in Ankerst et al (1999) based on the steepness of the reachability plot. One interpretation of thexi
parameter is that it classifies clusters by change in relative cluster density. The used algorithm was originally contributed by the ELKI framework and is explained in Schubert et al (2018), but contains a set of fixes.
Predict cluster memberships
predict()
requires an extracted DBSCAN clustering with extractDBSCAN()
and then
uses predict for dbscan()
.
Value
An object of class optics
with components:
eps 
value of 
minPts 
value of 
order 
optics order for the data points in 
reachdist 
reachability distance for each data point in 
coredist 
core distance for each data point in 
For extractDBSCAN()
, in addition the following
components are available:
eps_cl 
the value of the 
cluster 
assigned cluster labels in the order of the data points in 
For extractXi()
, in addition the following components
are available:
xi 
Steepness threshold 
cluster 
assigned cluster labels in the order of the data points in 
clusters_xi 
data.frame containing the start and end of each cluster found in the OPTICS ordering. 
Author(s)
Michael Hahsler and Matthew Piekenbrock
References
Mihael Ankerst, Markus M. Breunig, HansPeter Kriegel, Joerg Sander (1999). OPTICS: Ordering Points To Identify the Clustering Structure. ACM SIGMOD international conference on Management of data. ACM Press. pp. doi:10.1145/304181.304187
Hahsler M, Piekenbrock M, Doran D (2019). dbscan: Fast DensityBased Clustering with R. Journal of Statistical Software, 91(1), 130. doi:10.18637/jss.v091.i01
Erich Schubert, Michael Gertz (2018). Improving the Cluster Structure Extracted from OPTICS Plots. In Lernen, Wissen, Daten, Analysen (LWDA 2018), pp. 318329.
See Also
Density reachability.
Other clustering functions:
dbscan()
,
extractFOSC()
,
hdbscan()
,
jpclust()
,
sNNclust()
Examples
set.seed(2)
n < 400
x < cbind(
x = runif(4, 0, 1) + rnorm(n, sd = 0.1),
y = runif(4, 0, 1) + rnorm(n, sd = 0.1)
)
plot(x, col=rep(1:4, time = 100))
### run OPTICS (Note: we use the default eps calculation)
res < optics(x, minPts = 10)
res
### get order
res$order
### plot produces a reachability plot
plot(res)
### plot the order of points in the reachability plot
plot(x, col = "grey")
polygon(x[res$order, ])
### extract a DBSCAN clustering by cutting the reachability plot at eps_cl
res < extractDBSCAN(res, eps_cl = .065)
res
plot(res) ## black is noise
hullplot(x, res)
### recut at a higher eps threshold
res < extractDBSCAN(res, eps_cl = .07)
res
plot(res)
hullplot(x, res)
### extract hierarchical clustering of varying density using the Xi method
res < extractXi(res, xi = 0.01)
res
plot(res)
hullplot(x, res)
# Xi cluster structure
res$clusters_xi
### use OPTICS on a precomputed distance matrix
d < dist(x)
res < optics(d, minPts = 10)
plot(res)