DSC_DenStream {streamMOA} | R Documentation |
DenStream Data Stream Clusterer
Description
Interface for the DenStream cluster algorithm for data streams implemented in MOA.
Usage
DSC_DenStream(
epsilon,
mu = 1,
beta = 0.2,
lambda = 0.001,
initPoints = 100,
offline = 2,
processingSpeed = 1,
recluster = TRUE,
k = NULL
)
Arguments
epsilon |
defines the epsilon neighborhood which is the maximal radius of micro-clusters (r<=epsilon). Range: 0 to 1. |
mu |
minpoints as the weight w a core-micro-clusters needs to be created (w>=mu). Range: 0 to max(int). |
beta |
multiplier for mu to detect outlier micro-clusters given their weight w (w<beta x mu). Range: 0 to 1 |
lambda |
decay constant. |
initPoints |
number of points to use for initialization via DBSCAN. |
offline |
offline multiplier for epsilon. Range: between 2 and 20). Used for reachability reclustering |
processingSpeed |
Number of incoming points per time unit (important for decay). Range: between 1 and 1000. |
recluster |
logical; should the offline DBSCAN-based (i.e., reachability at a distance of epsilon) be performed? |
k |
integer; tries to automatically chooses offline to find k macro-clusters. |
Details
DenStream applies reachability (from DBSCAN) between micro-clusters for
reclustering using epsilon
x offline
(defaults to 2) as the
reachability threshold.
If k
is specified it automatically chooses the reachability threshold
to find k clusters. This is achieved using single-link hierarchical
clustering.
Value
An object of class DSC_DenStream
(subclass of stream::DSC,
DSC_MOA, stream::DSC_Micro) or, for recluster = TRUE
, an object
of class stream::DSC_TwoStage.
Author(s)
Michael Hahsler and John Forrest
References
Cao F, Ester M, Qian W, Zhou A (2006). Density-Based Clustering over an Evolving Data Stream with Noise. In Proceedings of the 2006 SIAM International Conference on Data Mining, pp 326-337. SIAM.
Bifet A, Holmes G, Pfahringer B, Kranen P, Kremer H, Jansen T, Seidl T (2010). MOA: Massive Online Analysis, a Framework for Stream Classification and Clustering. In Journal of Machine Learning Research (JMLR).
See Also
Other DSC_MOA:
DSC_BICO_MOA()
,
DSC_CluStream()
,
DSC_ClusTree()
,
DSC_DStream_MOA()
,
DSC_MCOD()
,
DSC_MOA()
,
DSC_StreamKM()
Examples
# data with 3 clusters and 5% noise
set.seed(1000)
stream <- DSD_Gaussians(k = 3, d = 2, noise = 0.05)
# use Den-Stream with reachability reclustering
denstream <- DSC_DenStream(epsilon = .05)
update(denstream, stream, 500)
denstream
# plot macro-clusters
plot(denstream, stream, type = "both")
# plot micro-cluster
plot(denstream, stream, type = "micro")
# show micro and macro-clusters
plot(denstream, stream, type = "both")
# reclustering: Choose reclustering reachability threshold automatically to find 4 clusters
denstream2 <- DSC_DenStream(epsilon = .05, k = 4)
update(denstream2, stream, 500)
plot(denstream2, stream, type = "both")