twdtw {twdtw} | R Documentation |
Calculate Time-Weighted Dynamic Time Warping (TWDTW) distance
Description
This function calculates the Time-Weighted Dynamic Time Warping (TWDTW) distance between two time series.
Usage
twdtw(x, y, time_weight, cycle_length, time_scale, ...)
## S3 method for class 'data.frame'
twdtw(
x,
y,
time_weight,
cycle_length,
time_scale,
origin = NULL,
index_column = "time",
max_elapsed = Inf,
output = "distance",
version = "f90",
...
)
## S3 method for class 'matrix'
twdtw(
x,
y,
time_weight,
cycle_length,
time_scale = NULL,
index_column = 1,
max_elapsed = Inf,
output = "distance",
version = "f90",
...
)
Arguments
x |
A data.frame or matrix representing time series. |
y |
A data.frame or matrix representing a labeled time series (reference). |
time_weight |
A numeric vector with length two (steepness and midpoint of logistic weight) or a function. See details. |
cycle_length |
The length of the cycle. Can be a numeric value or a string specifying the units ('year', 'month', 'day', 'hour', 'minute', 'second'). When numeric, the cycle length is in the same units as time_scale. When a string, it specifies the time unit of the cycle. |
time_scale |
Specifies the time scale for the conversion. Must be one of 'year', 'month', 'day', 'hour', 'minute', 'second'. When cycle_length is a string, time_scale changes the unit in which the result is expressed. When cycle_length is numeric, time_scale is used to compute the elapsed time in seconds. |
... |
ignore |
origin |
For numeric cycle_length, the origin must be specified. This is the point from which the elapsed time is computed. Must be of the same class as x. |
index_column |
(optional) The column name of the time index for data.frame inputs. Defaults to "time". For matrix input, an integer indicating the column with the time index. Defaults to 1. |
max_elapsed |
Numeric value constraining the TWDTW calculation to the lower band given by a maximum elapsed time. Defaults to Inf. |
output |
A character string defining the output. It must be one of 'distance', 'matches', 'internals'. Defaults to 'distance'.
'distance' will return the lowest TWDTW distance between |
version |
A string identifying the version of TWDTW implementation. Options are 'f90' for Fortran 90, 'f90goto' for Fortran 90 with goto statements, or 'cpp' for C++ version. Defaults to 'f90'. See details. |
Details
TWDTW calculates a time-weighted version of DTW by modifying each element of the
DTW's local cost matrix (see details in Maus et al. (2016) and Maus et al. (2019)).
The default time weight is calculated using a logistic function
that adds a weight to each pair of observations in the time series x
and y
based on the time difference between observations, such that
tw(dist_{i,j}) = dist_{i,j} + \frac{1}{1 + e^{-{\alpha} (el_{i,j} - {\beta})}}
Where:
tw
is the time-weight functiondist_{i,j}
is the Euclidean distance between the i-th element ofx
and the j-th element ofy
in a multi-dimensional spaceel_{i,j}
is the time elapsed between the i-th element ofx
and the j-th element ofy
\alpha
and\beta
are the steepness and midpoint of the logistic function, respectively
The logistic function is implemented as the default option in the C++ and Fortran versions of the code.
To use the native implementation, \alpha
and \beta
must be provided as a numeric vector of
length two using the argument time_weight
. This implementation provides high processing performance.
The time_weight
argument also accepts a function defined in R, allowing the user to define a different
weighting scheme. However, passing a function to time_weight
can degrade the processing performance,
i.e., it can be up to 3x slower than using the default logistic time-weight.
A time-weight function passed to time_weight
must receive two numeric arguments and return a
single numeric value. The first argument received is the Euclidean dist_{i,j}
and the second
is the elapsed time el_{i,j}
. For example,
time_weight = function(dist, el) dist + 0.1*el
defines a linear weighting scheme with a slope of 0.1.
The Fortran 90 versions of twdtw
are usually faster than the C++ version.
The 'f90goto
' version, which uses goto statements, is slightly quicker than the
'f90
' version that uses while and for loops. You can use the max_elapsed
parameter
to limit the TWDTW calculation to a maximum elapsed time. This means it will skip
comparisons between pairs of observations in x
and y
that are far apart in time.
Be careful, though: if max_elapsed
is set too low, it could change the results.
It important to try out different settings for your specific problem.
Value
An S3 object twdtw either: If output = 'distance', a numeric value representing the TWDTW distance between the two time series. If output = 'matches', a numeric matrix of all TWDTW matches. For each match the starting index, ending index, and distance are returned. If output = 'internals', a list of all TWDTW internal data is returned.
References
Maus, V., Camara, G., Cartaxo, R., Sanchez, A., Ramos, F. M., & de Moura, Y. M. (2016). A Time-Weighted Dynamic Time Warping Method for Land-Use and Land-Cover Mapping. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 9(8), 3729-3739. doi:10.1109/JSTARS.2016.2517118
Maus, V., Camara, G., Appel, M., & Pebesma, E. (2019). dtwSat: Time-Weighted Dynamic Time Warping for Satellite Image Time Series Analysis in R. Journal of Statistical Software, 88(5), 1-31. doi:10.18637/jss.v088.i05
Examples
# Create a time series
n <- 23
t <- seq(0, pi, length.out = n)
d <- seq(as.Date('2020-09-01'), length.out = n, by = "15 day")
x <- data.frame(time = d, v1 = sin(t)*2 + runif(n))
# shift time by 30 days
y <- data.frame(time = d + 30, v1 = sin(t)*2 + runif(n))
plot(x, type = "l", xlim = range(c(d, d + 5)))
lines(y, col = "red")
# Calculate TWDTW distance between x and y using logistic weight
twdtw(x, y,
cycle_length = 'year',
time_scale = 'day',
time_weight = c(steepness = 0.1, midpoint = 50))
# Pass a generic time-weight function
twdtw(x, y,
cycle_length = 'year',
time_scale = 'day',
time_weight = function(x,y) x + 1.0 / (1.0 + exp(-0.1 * (y - 50))))
# Test other version
twdtw(x, y,
cycle_length = 'year',
time_scale = 'day',
time_weight = c(steepness = 0.1, midpoint = 50),
version = 'f90goto')
twdtw(x, y,
cycle_length = 'year',
time_scale = 'day',
time_weight = c(steepness = 0.1, midpoint = 50),
version = 'cpp')