summarize_ema_half_life {sparklyr.flint} | R Documentation |
EMA half-life summarizer
Description
Calculate the exponential moving average of a time series using the half- life specified and store the result in a new column named '<column>_ema' See https://github.com/twosigma/flint/blob/master/doc/ema.md for details on different EMA implementations.
Usage
summarize_ema_half_life(
ts_rdd,
column,
half_life_duration,
window = NULL,
time_column = "time",
interpolation = c("previous", "linear", "current"),
convention = c("legacy", "convolution", "core"),
key_columns = list()
)
Arguments
ts_rdd |
Timeseries RDD being summarized |
column |
Column to be summarized |
half_life_duration |
A time duration specified in string form (e.g., "1d", "1h", "15m", etc) representing the half-life duration |
window |
Either an R expression specifying time windows to be summarized (e.g., 'in_past("1h")' to summarize the EMA of 'column' within the time interval of [t - 1h, t] for each timestamp 't', 'in_future("5s")' to summarize EMA of 'column' within the time interval of [t, t + 5s] for each timestamp 't'), or 'NULL' to summarize EMA of 'column' within the time interval of (-inf, t] for each timestamp 't' |
time_column |
Name of the column containing timestamps (default: "time") |
interpolation |
Method used for interpolating values between two consecutive data points, must be one of "previous", "linear", and "current" (default: "previous"). See https://github.com/twosigma/flint/blob/master/doc/ema.md for details on different interpolation methods. |
convention |
Convolution convention, must be one of "convolution", "core", and "legacy" (default: "legacy"). See https://github.com/twosigma/flint/blob/master/doc/ema.md for details. |
key_columns |
Optional list of columns that will form an equivalence relation associating each record with the time series it belongs to (i.e., any 2 records having equal values in those columns will be associated with the same time series, and any 2 records having differing values in those columns are considered to be from 2 separate time series and will therefore be summarized separately) By default, 'key_colums' is empty and all records are considered to be part of a single time series. |
See Also
Other summarizers:
ols_regression()
,
summarize_avg()
,
summarize_corr2()
,
summarize_corr()
,
summarize_count()
,
summarize_covar()
,
summarize_dot_product()
,
summarize_ewma()
,
summarize_geometric_mean()
,
summarize_kurtosis()
,
summarize_max()
,
summarize_min()
,
summarize_nth_central_moment()
,
summarize_nth_moment()
,
summarize_product()
,
summarize_quantile()
,
summarize_skewness()
,
summarize_stddev()
,
summarize_sum()
,
summarize_var()
,
summarize_weighted_avg()
,
summarize_weighted_corr()
,
summarize_weighted_covar()
,
summarize_z_score()
Examples
library(sparklyr)
library(sparklyr.flint)
sc <- try_spark_connect(master = "local")
if (!is.null(sc)) {
price_sdf <- copy_to(
sc,
data.frame(time = seq(1000), price = rnorm(1000))
)
ts <- fromSDF(price_sdf, is_sorted = TRUE, time_unit = "SECONDS")
ts_ema <- summarize_ema_half_life(
ts,
column = "price",
half_life_duration = "100s"
)
} else {
message("Unable to establish a Spark connection!")
}