stamp {matrixprofiler} | R Documentation |
Matrix Profile Computation
Description
STAMP Computes the best so far Matrix Profile and Profile Index for Univariate Time Series.
STOMP is a faster implementation with the caveat that is not anytime as STAMP or SCRIMP.
SCRIMP is a faster implementation, like STOMP, but has the ability to return anytime results as STAMP.
MPX is by far the fastest implementation with the caveat that is not anytime as STAMP or SCRIMP.
Usage
stamp(
data,
window_size,
query = NULL,
exclusion_zone = 0.5,
s_size = 1,
n_workers = 1,
progress = TRUE
)
stomp(
data,
window_size,
query = NULL,
exclusion_zone = 0.5,
n_workers = 1,
progress = TRUE,
left_right_profile = FALSE
)
scrimp(
data,
window_size,
query = NULL,
exclusion_zone = 0.5,
s_size = 1,
pre_scrimp = 0.25,
n_workers = 1,
progress = TRUE
)
mpx(
data,
window_size,
query = NULL,
exclusion_zone = 0.5,
s_size = 1,
idxs = TRUE,
distance = c("euclidean", "pearson"),
n_workers = 1,
progress = TRUE
)
Arguments
data |
Required. Any 1-dimension series of numbers ( |
window_size |
Required. An integer defining the rolling window size. |
query |
(not yet on |
exclusion_zone |
A numeric. Defines the size of the area around the rolling window that will be ignored to avoid
trivial matches. Default is |
s_size |
A numeric. Used on anytime algorithms (stamp, scrimp, mpx) if only part of the computation is needed.
Default is |
n_workers |
An integer. The number of threads using for computing. Defaults to |
progress |
A logical. If |
left_right_profile |
( |
pre_scrimp |
A numeric. If not zero, pre_scrimp is computed, using a fraction of the data. Default is |
idxs |
( |
distance |
( |
Details
The Matrix Profile, has the potential to revolutionize time series data mining because of its generality, versatility, simplicity and scalability. In particular it has implications for time series motif discovery, time series joins, shapelet discovery (classification), density estimation, semantic segmentation, visualization, rule discovery, clustering etc.
progress
, it is really recommended to use it as feedback for long computations. It indeed adds some
(neglectable) overhead, but the benefit of knowing that your computer is still computing is much bigger than the
seconds you may lose in the final benchmark. About n_workers
, for Windows systems, this package uses TBB for
multithreading, and Linux and macOS, use TinyThread++. This may or not raise some issues in the future, so we must be
aware of slower processing due to different mutexes implementations or even unexpected crashes. The Windows version
is usually more reliable. The data
and query
parameters will be internally converted to a single vector using
as.numeric()
, thus, bear in mind that a multidimensional matrix may not work as you expect, but most 1-dimensional
data types will work normally. If query
is provided, expect the same pre-procesment done for data
; in addition,
exclusion_zone
will be ignored and set to 0
. Both data
and query
doesn't need to have the same size and they
can be interchanged if both are provided. The difference will be in the returning object. AB-Join returns the Matrix
Profile 'A' and 'B' i.e., the distance between a rolling window from query to data and from data to query.
stamp
The anytime STAMP computes the Matrix Profile and Profile Index in such manner that it can be stopped before its complete calculation and return the best so far results allowing ultra-fast approximate solutions.
stomp
The STOMP uses a faster implementation to compute the Matrix Profile and Profile Index. It can be stopped earlier by
the user, but the result is not considered anytime, just incomplete. For a anytime algorithm, use stamp()
or
scrimp()
.
scrimp
The SCRIMP algorithm was the anytime solution for stomp. It is as fast as stomp but allows the user to cancel the computation and get an approximation of the final result. This implementation uses the SCRIMP++ code. This means that, at first, it will compute the pre-scrimp (a very fast and good approximation), and continue improving with scrimp. The exception is if you use multithreading, that skips the pre-scrimp stage.
mpx
This algorithm was developed apart from the main Matrix Profile branch that relies on Fast Fourier Transform (FFT) at least in one part of the process. This algorithm doesn't use FFT at all and is several times faster. It also relies on Ogita's work for better precision computing mean and standard deviation (part of the process).
Value
Returns a list
with the matrix_profile
, profile_index
(if idxs
is TRUE
in mpx()
), and some
information about the settings used to build it, like ez
and partial
when the algorithm is finished early.
This document
Last updated on 2023-01-25 using R version 4.2.2.
References
Yeh CCM, Zhu Y, Ulanova L, Begum N, Ding Y, Dau HA, et al. Matrix profile I: All pairs similarity joins for time series: A unifying view that includes motifs, discords and shapelets. Proc - IEEE Int Conf Data Mining, ICDM. 2017;1317-22.
Zhu Y, Imamura m, Nikovski D, Keogh E. Matrix Profile VII: Time Series Chains: A New Primitive for Time Series Data Mining. Knowl Inf Syst. 2018 Jun 2;1-27.
Zhu Y, Zimmerman Z, Senobari NS, Yeh CM, Funning G. Matrix Profile II : Exploiting a Novel Algorithm and GPUs to Break the One Hundred Million Barrier for Time Series Motifs and Joins. Icdm. 2016 Jan 22;54(1):739-48.
Website: http://www.cs.ucr.edu/~eamonn/MatrixProfile.html
See Also
mass()
for the underlying algorithm that finds best match of a query.
mpxab()
for the forward and reverse join-similarity.
Examples
mp <- stamp(motifs_discords_small, 50)
mp <- stomp(motifs_discords_small, 50)
mp <- scrimp(motifs_discords_small, 50)
mp <- mpx(motifs_discords_small, 50)