R: Apply function column-wise

colapply.DTSg {DTSg}

R Documentation

Apply function column-wise

Description

Applies an arbitrary function to selected columns of a DTSg object.

Usage

## S3 method for class 'DTSg'
colapply(
  x,
  fun,
  ...,
  cols = self$cols(class = "numeric")[1L],
  resultCols = NULL,
  suffix = NULL,
  helpers = TRUE,
  funby = NULL,
  ignoreDST = FALSE,
  multiplier = 1L,
  funbyHelpers = NULL,
  funbyApproach = self$funbyApproach,
  clone = getOption("DTSgClone")
)

Arguments

`x`	A `DTSg` object (S3 method only).
`fun`	A `function`. Its return value must be of length one.
`...`	Further arguments passed on to `fun`.
`cols`	A character vector specifying the columns to apply `fun` to. Another possibility is a character string containing either comma separated column names, for example, `"x,y,z"`, or the start and end column separated by a colon, for example, `"x:z"`.
`resultCols`	An optional character vector of the same length as `cols` specifying the column names for the return values of `fun`. Another possibility is a character string containing comma separated column names, for example, `"x,y,z"`. Non-existing columns are added and existing columns are overwritten. Columns are matched element-wise between `cols` and `resultCols`.
`suffix`	An optional character string. The return values of `fun` are added as new columns with names consisting of the columns specified in `cols` and this suffix. Existing columns are never overwritten. Only used when `resultCols` is not specified.
`helpers`	A logical specifying if helper data shall be handed over to `fun`. See corresponding section for further information.
`funby`	One of the temporal aggregation level functions described in `TALFs` or a user defined temporal aggregation level function. Can be used to apply functions like `cumsum` to a certain temporal level. See corresponding section and examples for further information.
`ignoreDST`	A logical specifying if day saving time shall be ignored by `funby`. See corresponding section for further information.
`multiplier`	A positive integerish value “multiplying” the temporal aggregation level of certain `TALFs`. See corresponding section for further information.
`funbyHelpers`	An optional `list` with helper data passed on to `funby`. See corresponding section for further information.
`funbyApproach`	A character string specifying the flavour of the applied temporal aggregation level function. Either `"base"`, which utilises `as.POSIXct`, or `"fasttime"`, which utilises `fasttime::fastPOSIXct`, or `"RcppCCTZ"`, which utilises `RcppCCTZ::parseDatetime` as the main function for transforming timestamps.
`clone`	A logical specifying if the object shall be modified in place or if a deep clone (copy) shall be made beforehand.

Value

Returns a DTSg object.

Helper data

In addition to the ... argument, this method optionally hands over a list argument with helper data called .helpers to fun. This list contains the following elements:

.dateTime: A POSIXct vector containing the .dateTime column.
periodicity: Same as the periodicity field.
minLag: A difftime object containing the minimum time difference between two subsequent timestamps.
maxLag: A difftime object containing the maximum time difference between two subsequent timestamps.

User defined TALFs, TALFs helper data and multiplier

User defined temporal aggregation level functions have to return a POSIXct vector of the same length as the time series and accept two arguments: a POSIXct vector as its first and a list with helper data as its second. The default elements of this list are as follows:

timezone: Same as the timezone field.
ignoreDST: Same as the ignoreDST argument.
periodicity: Same as the periodicity field.
na.status: Same as the na.status field.
multiplier: Same as the multiplier argument.
funbyApproach: Same as the funbyApproach argument.

Any additional element specified in the funbyHelpers argument is appended to the end of the helper data list. In case funbyHelpers contains an ignoreDST, multiplier or funbyApproach element, it takes precedence over the respective method argument. timezone, periodicity and na.status elements are rejected, as they are always taken directly from the object.

The temporal aggregation level of certain TALFs can be adjusted with the help of the multiplier argument. A multiplier of 10, for example, makes byY_____ aggregate to decades instead of years. Another example is a multiplier of 6 provided to by_m____. The function then aggregates all months of all first and all months of all second half years instead of all months of all years separately. This feature is supported by the following TALFs of the package:

byY_____
byYm____
byYmdH__ (UTC and equivalent as well as all Etc/GMT time zones only)
byYmdHM_
byYmdHMS
by_m____
by___H__ (UTC and equivalent as well as all Etc/GMT time zones only)
by____M_
by_____S

Ignore day saving time

ignoreDST tells a temporal aggregation level function if it is supposed to ignore day saving time while transforming the timestamps. This can be a desired feature for time series strictly following the position of the sun such as hydrological time series. Doing so ensures that diurnal variations are preserved by all means and all intervals are of the “correct” length, however, a possible limitation might be that the day saving time shift is invariably assumed to be one hour long. This feature requires that the periodicity of the time series was recognised and is supported by the following TALFs of the package:

Examples

# new DTSg object
x <- DTSg$new(values = flow)

# linear interpolation of missing values
## R6 method
x$colapply(fun = interpolateLinear)$print()

## S3 method
print(colapply(x = x, fun = interpolateLinear))

# daily cumulative sums per month
## R6 method
x$colapply(
  fun = cumsum,
  helpers = FALSE,
  funby = byYm____
)$print()

## S3 method
print(colapply(
  x = x,
  fun = cumsum,
  helpers = FALSE,
  funby = byYm____
))

# calculate moving averages with the help of 'runner' (all four given
# approaches provide the same result with explicitly missing timestamps)
if (requireNamespace("runner", quietly = TRUE) &&
    packageVersion("runner") >= package_version("0.3.5")) {
  wrapper <- function(..., .helpers) {
    runner::runner(..., idx = .helpers[[".dateTime"]])
  }

  ## R6 method
  x$colapply(
    fun = runner::runner,
    f = mean,
    k = 5,
    lag = -2
  )$print()
  x$colapply(
    fun = wrapper,
    f = mean,
    k = "5 days",
    lag = "-2 days"
  )$print()
  x$colapply(
    fun = runner::runner,
    f = mean,
    k = "5 days",
    lag = "-2 days",
    idx = x$getCol(col = ".dateTime")
  )$print()
  x$colapply(
    fun = runner::runner,
    f = mean,
    k = "5 days",
    lag = "-2 days",
    idx = x[".dateTime"]
  )$print()

  ## S3 method
  print(colapply(
    x = x,
    fun = runner::runner,
    f = mean,
    k = 5,
    lag = -2
  ))
  print(colapply(
    x = x,
    fun = wrapper,
    f = mean,
    k = "5 days",
    lag = "-2 days"
  ))
  print(colapply(
    x = x,
    fun = runner::runner,
    f = mean,
    k = "5 days",
    lag = "-2 days",
    idx = getCol(x = x, col = ".dateTime")
  ))
  print(colapply(
    x = x,
    fun = runner::runner,
    f = mean,
    k = "5 days",
    lag = "-2 days",
    idx = x[".dateTime"]
  ))
}

# calculate rolling correlations somewhat inefficiently with the help of
# 'runner'
if (requireNamespace("runner", quietly = TRUE) &&
    packageVersion("runner") >= package_version("0.3.8")) {
  wrapper <- function(x, y, f, k, lag, ...) {
    runner::runner(
      cbind(x, y),
      f = function(x) f(x[, 1], x[, 2]),
      k = k,
      lag = lag
    )
  }

  ## R6 method
  x$colapply(
    fun = wrapper,
    y = x["flow"] + rnorm(length(x["flow"])),
    f = cor,
    k = 5,
    lag = -2
  )$print()

  ## S3 method
  print(colapply(
    x = x,
    fun = wrapper,
    y = x["flow"] + rnorm(length(x["flow"])),
    f = cor,
    k = 5,
    lag = -2
  ))
}

[Package DTSg version 1.1.3 Index]