R: Count summarizer

summarize_count {sparklyr.flint}

R Documentation

Count summarizer

Description

Count the total number of records if no column is specified, or the number of non-null values within the specified column within each time window or within each group of records with identical timestamps

Usage

summarize_count(ts_rdd, column = NULL, window = NULL, key_columns = list())

Arguments

`ts_rdd`	Timeseries RDD being summarized
`column`	If not NULL, then report the number of values in the column specified that are not NULL or NaN within each time window or group of records with identical timestamps, and store the counts in a new column named '<column>_count'. Otherwise the number of records within each time window or group of records with identical timestamps is reported, and stored in a column named 'count'.
`window`	Either an R expression specifying time windows to be summarized (e.g., 'in_past("1h")' to summarize data from looking behind 1 hour at each time point, 'in_future("5s")' to summarize data from looking forward 5 seconds at each time point), or 'NULL' to compute aggregate statistics on records grouped by timestamps
`key_columns`	Optional list of columns that will form an equivalence relation associating each record with the time series it belongs to (i.e., any 2 records having equal values in those columns will be associated with the same time series, and any 2 records having differing values in those columns are considered to be from 2 separate time series and will therefore be summarized separately) By default, 'key_colums' is empty and all records are considered to be part of a single time series.

Value

A TimeSeriesRDD containing the summarized result

Other summarizers: ols_regression(), summarize_avg(), summarize_corr2(), summarize_corr(), summarize_covar(), summarize_dot_product(), summarize_ema_half_life(), summarize_ewma(), summarize_geometric_mean(), summarize_kurtosis(), summarize_max(), summarize_min(), summarize_nth_central_moment(), summarize_nth_moment(), summarize_product(), summarize_quantile(), summarize_skewness(), summarize_stddev(), summarize_sum(), summarize_var(), summarize_weighted_avg(), summarize_weighted_corr(), summarize_weighted_covar(), summarize_z_score()

Examples


library(sparklyr)
library(sparklyr.flint)

sc <- try_spark_connect(master = "local")

if (!is.null(sc)) {
  sdf <- copy_to(sc, tibble::tibble(t = seq(10), v = seq(10)))
  ts <- fromSDF(sdf, is_sorted = TRUE, time_unit = "SECONDS", time_column = "t")
  ts_count <- summarize_count(ts, column = "v", window = in_past("3s"))
} else {
  message("Unable to establish a Spark connection!")
}