runner {runner} | R Documentation |
Apply running function
Description
Applies custom function on running windows.
Usage
runner(
x,
f = function(x) x,
k = integer(0),
lag = integer(1),
idx = integer(0),
at = integer(0),
na_pad = FALSE,
simplify = TRUE,
cl = NULL,
...
)
## Default S3 method:
runner(
x,
f = function(x) x,
k = integer(0),
lag = integer(1),
idx = integer(0),
at = integer(0),
na_pad = FALSE,
simplify = TRUE,
cl = NULL,
...
)
## S3 method for class 'data.frame'
runner(
x,
f = function(x) x,
k = attr(x, "k"),
lag = if (!is.null(attr(x, "lag"))) attr(x, "lag") else integer(1),
idx = attr(x, "idx"),
at = attr(x, "at"),
na_pad = if (!is.null(attr(x, "na_pad"))) attr(x, "na_pad") else FALSE,
simplify = TRUE,
cl = NULL,
...
)
## S3 method for class 'grouped_df'
runner(
x,
f = function(x) x,
k = attr(x, "k"),
lag = if (!is.null(attr(x, "lag"))) attr(x, "lag") else integer(1),
idx = attr(x, "idx"),
at = attr(x, "at"),
na_pad = if (!is.null(attr(x, "na_pad"))) attr(x, "na_pad") else FALSE,
simplify = TRUE,
cl = NULL,
...
)
## S3 method for class 'matrix'
runner(
x,
f = function(x) x,
k = integer(0),
lag = integer(1),
idx = integer(0),
at = integer(0),
na_pad = FALSE,
simplify = TRUE,
cl = NULL,
...
)
## S3 method for class 'xts'
runner(
x,
f = function(x) x,
k = integer(0),
lag = integer(1),
idx = integer(0),
at = integer(0),
na_pad = FALSE,
simplify = TRUE,
cl = NULL,
...
)
Arguments
x |
( |
f |
( |
k |
( |
lag |
( |
idx |
( |
at |
( |
na_pad |
( |
simplify |
( |
cl |
( |
... |
(optional) |
Details
Function can apply any R function on running windows defined by x
,
k
, lag
, idx
and at
. Running window can be calculated
on several ways:
-
Cumulative windows
applied when user doesn't specifyk
argument or specifyk = length(x)
, this would mean thatk
is equal to number of available elements
-
Constant sliding windows applied when user specify
k
as constant value keepingidx
andat
unspecified.lag
argument shifts windows left (lag > 0
) or right (lag < 0
).
-
Windows depending on date
If one specifiesidx
this would mean that output windows size might change in size because of unequally spaced indexes. Fox example 5-period window is different than 5-element window, because 5-period window might contain any number of observation (7-day mean is not the same as 7-element mean)
-
Window at specific indices
runner
by default returns vector of the same size asx
unless one specifiesat
argument. Each element ofat
is an index on which runner calculates function - which means that output of the runner is now of length equal toat
. Note that one can change index ofx
by specifyingidx
. Illustration below shows output ofrunner
forat = c(18, 27, 45, 31)
which gives windows in ranges enclosed in square brackets. Range forat = 27
is[22, 26]
which is not available in current indices.
Specifying time-intervals
at
can also be specified as interval of the output defined by
at = "<increment>"
which results in indices sequence defined by
seq.POSIXt(min(idx), max(idx), by = "<increment>")
. Increment of sequence
is the same as in base::seq.POSIXt()
function.
It's worth noting that increment interval can't be more frequent than
interval of idx
- for Date
the most frequent time-unit is a "day"
,
for POSIXt
a sec
.
k
and lag
can also be specified as using time sequence increment.
Available time units are
"sec", "min", "hour", "day", "DSTday", "week", "month", "quarter" or "year"
.
To increment by number of units one can also specify <number> <unit>s
for example lag = "-2 days"
, k = "5 weeks"
.
Setting k
and lag
as a sequence increment can be also a vector can be a
vector which allows to stretch and lag/lead each window freely on in time
(on indices).
Parallel computing
Beware that executing R call in parallel not always
have the edge over single-thread even if the
cl <- registerCluster(detectCores())
was specified before.
Parallel windows are executed in the independent environment, which means
that objects other than function arguments needs to be copied to the
parallel environment using parallel::clusterExport()
. For
example using f = function(x) x + y + z
will result in error as
clusterExport(cl, varlist = c("y", "z"))
needs to be called before.
Value
vector with aggregated values for each window. Length of output is
the same as length(x)
or length(at)
if specified. Type of the output
depends on the output from a function f
.
Examples
# runner returns windows as is by default
runner(1:10)
# mean on k = 3 elements windows
runner(1:10, f = mean, k = 3)
# mean on k = 3 elements windows with different specification
runner(1:10, k = 3, f = function(x) mean(x, na.rm = TRUE))
# concatenate two columns
runner(
data.frame(
a = letters[1:10],
b = 1:10
),
f = function(x) paste(paste0(x$a, x$b), collapse = "+")
)
# concatenate two columns with additional argument
runner(
data.frame(
a = letters[1:10],
b = 1:10
),
f = function(x, xxx) {
paste(paste0(x$a, xxx, x$b), collapse = " + ")
},
xxx = "..."
)
# number of unique values in each window (varying window size)
runner(letters[1:10],
k = c(1, 2, 2, 4, 5, 5, 5, 5, 5, 5),
f = function(x) length(unique(x))
)
# concatenate only on selected windows index
runner(letters[1:10],
f = function(x) paste(x, collapse = "-"),
at = c(1, 5, 8)
)
# 5 days mean
idx <- c(4, 6, 7, 13, 17, 18, 18, 21, 27, 31, 37, 42, 44, 47, 48)
runner::runner(
x = idx,
k = "5 days",
lag = 1,
idx = Sys.Date() + idx,
f = function(x) mean(x)
)
# 5 days mean at 4-indices
runner::runner(
x = 1:15,
k = 5,
lag = 1,
idx = idx,
at = c(18, 27, 48, 31),
f = mean
)
# runner with data.frame
df <- data.frame(
a = 1:13,
b = 1:13 + rnorm(13, sd = 5),
idx = seq(as.Date("2022-02-22"), as.Date("2023-02-22"), by = "1 month")
)
runner(
x = df,
idx = "idx",
at = "6 months",
f = function(x) {
cor(x$a, x$b)
}
)
# parallel computing
library(parallel)
data <- data.frame(
a = runif(100),
b = runif(100),
idx = cumsum(sample(rpois(100, 5)))
)
const <- 0
cl <- makeCluster(1)
clusterExport(cl, "const", envir = environment())
runner(
x = data,
k = 10,
f = function(x) {
cor(x$a, x$b) + const
},
idx = "idx",
cl = cl
)
stopCluster(cl)
# runner with matrix
data <- matrix(data = runif(100, 0, 1), nrow = 20, ncol = 5)
runner(
x = data,
f = function(x) {
tryCatch(
cor(x),
error = function(e) NA
)
}
)