R: Calculate Time-Weighted Dynamic Time Warping (TWDTW) distance

twdtw {twdtw}

R Documentation

Calculate Time-Weighted Dynamic Time Warping (TWDTW) distance

Description

This function calculates the Time-Weighted Dynamic Time Warping (TWDTW) distance between two time series.

Usage

twdtw(x, y, time_weight, cycle_length, time_scale, ...)

## S3 method for class 'data.frame'
twdtw(
  x,
  y,
  time_weight,
  cycle_length,
  time_scale,
  origin = NULL,
  index_column = "time",
  max_elapsed = Inf,
  output = "distance",
  version = "f90",
  ...
)

## S3 method for class 'matrix'
twdtw(
  x,
  y,
  time_weight,
  cycle_length,
  time_scale = NULL,
  index_column = 1,
  max_elapsed = Inf,
  output = "distance",
  version = "f90",
  ...
)

Arguments

`x`	A data.frame or matrix representing time series.
`y`	A data.frame or matrix representing a labeled time series (reference).
`time_weight`	A numeric vector with length two (steepness and midpoint of logistic weight) or a function. See details.
`cycle_length`	The length of the cycle. Can be a numeric value or a string specifying the units ('year', 'month', 'day', 'hour', 'minute', 'second'). When numeric, the cycle length is in the same units as time_scale. When a string, it specifies the time unit of the cycle.
`time_scale`	Specifies the time scale for the conversion. Must be one of 'year', 'month', 'day', 'hour', 'minute', 'second'. When cycle_length is a string, time_scale changes the unit in which the result is expressed. When cycle_length is numeric, time_scale is used to compute the elapsed time in seconds.
`...`	ignore
`origin`	For numeric cycle_length, the origin must be specified. This is the point from which the elapsed time is computed. Must be of the same class as x.
`index_column`	(optional) The column name of the time index for data.frame inputs. Defaults to "time". For matrix input, an integer indicating the column with the time index. Defaults to 1.
`max_elapsed`	Numeric value constraining the TWDTW calculation to the lower band given by a maximum elapsed time. Defaults to Inf.
`output`	A character string defining the output. It must be one of 'distance', 'matches', 'internals'. Defaults to 'distance'. 'distance' will return the lowest TWDTW distance between `x` and `y`. 'matches' will return all matches within the TWDTW matrix. 'internals' will return all TWDTW internal data.
`version`	A string identifying the version of TWDTW implementation. Options are 'f90' for Fortran 90, 'f90goto' for Fortran 90 with goto statements, or 'cpp' for C++ version. Defaults to 'f90'. See details.

Details

TWDTW calculates a time-weighted version of DTW by modifying each element of the DTW's local cost matrix (see details in Maus et al. (2016) and Maus et al. (2019)). The default time weight is calculated using a logistic function that adds a weight to each pair of observations in the time series x and y based on the time difference between observations, such that

tw(dist_{i,j}) = dist_{i,j} + \frac{1}{1 + e^{-{\alpha} (el_{i,j} - {\beta})}}

Where:

tw is the time-weight function
dist_{i,j} is the Euclidean distance between the i-th element of x and the j-th element of y in a multi-dimensional space
el_{i,j} is the time elapsed between the i-th element of x and the j-th element of y
\alpha and \beta are the steepness and midpoint of the logistic function, respectively

The logistic function is implemented as the default option in the C++ and Fortran versions of the code. To use the native implementation, \alpha and \beta must be provided as a numeric vector of length two using the argument time_weight. This implementation provides high processing performance.

The time_weight argument also accepts a function defined in R, allowing the user to define a different weighting scheme. However, passing a function to time_weight can degrade the processing performance, i.e., it can be up to 3x slower than using the default logistic time-weight.

A time-weight function passed to time_weight must receive two numeric arguments and return a single numeric value. The first argument received is the Euclidean dist_{i,j} and the second is the elapsed time el_{i,j}. For example, time_weight = function(dist, el) dist + 0.1*el defines a linear weighting scheme with a slope of 0.1.

The Fortran 90 versions of twdtw are usually faster than the C++ version. The 'f90goto' version, which uses goto statements, is slightly quicker than the 'f90' version that uses while and for loops. You can use the max_elapsed parameter to limit the TWDTW calculation to a maximum elapsed time. This means it will skip comparisons between pairs of observations in x and y that are far apart in time. Be careful, though: if max_elapsed is set too low, it could change the results. It important to try out different settings for your specific problem.

Value

An S3 object twdtw either: If output = 'distance', a numeric value representing the TWDTW distance between the two time series. If output = 'matches', a numeric matrix of all TWDTW matches. For each match the starting index, ending index, and distance are returned. If output = 'internals', a list of all TWDTW internal data is returned.

References

Maus, V., Camara, G., Cartaxo, R., Sanchez, A., Ramos, F. M., & de Moura, Y. M. (2016). A Time-Weighted Dynamic Time Warping Method for Land-Use and Land-Cover Mapping. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 9(8), 3729-3739. doi:10.1109/JSTARS.2016.2517118

Maus, V., Camara, G., Appel, M., & Pebesma, E. (2019). dtwSat: Time-Weighted Dynamic Time Warping for Satellite Image Time Series Analysis in R. Journal of Statistical Software, 88(5), 1-31. doi:10.18637/jss.v088.i05

Examples


# Create a time series
n <- 23
t <- seq(0, pi, length.out = n)
d <- seq(as.Date('2020-09-01'), length.out = n, by = "15 day")

x <- data.frame(time = d,      v1 = sin(t)*2 + runif(n))

# shift time by 30 days
y <- data.frame(time = d + 30, v1 = sin(t)*2 + runif(n))

plot(x, type = "l", xlim = range(c(d, d + 5)))
lines(y, col = "red")

# Calculate TWDTW distance between x and y using logistic weight
twdtw(x, y,
      cycle_length = 'year',
      time_scale = 'day',
      time_weight = c(steepness = 0.1, midpoint = 50))

# Pass a generic time-weight function
twdtw(x, y,
      cycle_length = 'year',
      time_scale = 'day',
      time_weight = function(x,y) x + 1.0 / (1.0 + exp(-0.1 * (y - 50))))

# Test other version
twdtw(x, y,
      cycle_length = 'year',
      time_scale = 'day',
      time_weight = c(steepness = 0.1, midpoint = 50),
      version = 'f90goto')

twdtw(x, y,
      cycle_length = 'year',
      time_scale = 'day',
      time_weight = c(steepness = 0.1, midpoint = 50),
      version = 'cpp')

[Package twdtw version 1.0-1 Index]