R: Add a variable of a higher interval to a data frame

thicken {padr}

R Documentation

Add a variable of a higher interval to a data frame

Description

Take the datetime variable in a data frame and map this to a variable of a higher interval. The mapping is added to the data frame in a new variable.

Usage

thicken(
  x,
  interval,
  colname = NULL,
  rounding = c("down", "up"),
  by = NULL,
  start_val = NULL,
  drop = FALSE,
  ties_to_earlier = FALSE
)

Arguments

`x`	A data frame containing at least one datetime variable of class `Date`, `POSIXct` or `POSIXlt`.
`interval`	The interval of the added datetime variable. Any character string that would be accepted by `seq.Date` or `seq.POSIXt`. It can only be higher than the interval and step size of the input data.
`colname`	The column name of the added variable. If `NULL` it will be the name of the original datetime variable with the interval name added to it (including the unit), separated by underscores.
`rounding`	Should a value in the input datetime variable be mapped to the closest value that is lower (`down`) or that is higher (`up`) than itself.
`by`	Only needs to be specified when `x` contains multiple variables of class `Date`, `POSIXct` or `POSIXlt`. Indicates which to use for thickening.
`start_val`	By default the first instance of `interval` that is lower than the lowest value of the input datetime variable, with all time units on default value. Specify `start_val` as an offset if you want the range to be nonstandard.
`drop`	Should the original datetime variable be dropped from the returned data frame? Defaults to `FALSE`.
`ties_to_earlier`	By default when the original datetime observations is tied with a value in the added datetime variable, it is assigned to the current value when rounding is down or to the next value when rounding is up. When `TRUE` the ties will be assigned to the previous observation of the new variable instead.

Details

When the datetime variable contains missing values, they are left in place in the dataframe. The added column with the new datetime variable, will have a missing values for these rows as well.

See vignette("padr") for more information on thicken. See vignette("padr_implementation") for detailed information on daylight savings time, different timezones, and the implementation of thicken.

Value

The data frame x with the variable added to it.

Examples

x_hour <- seq(lubridate::ymd_hms('20160302 000000'), by = 'hour',
              length.out = 200)
some_df <- data.frame(x_hour = x_hour)
thicken(some_df, 'week')
thicken(some_df, 'month')
thicken(some_df, 'day', start_val = lubridate::ymd_hms('20160301 120000'))

library(dplyr)
x_df <- data.frame(
  x = seq(lubridate::ymd(20130101), by = 'day', length.out = 1000) %>%
    sample(500),
  y = runif(500, 10, 50) %>% round) %>%
  arrange(x)

# get the max per month
x_df %>% thicken('month') %>% group_by(x_month) %>%
  summarise(y_max = max(y))

# get the average per week, but you want your week to start on Mondays
# instead of Sundays
x_df %>% thicken('week',
                 start_val = closest_weekday(x_df$x, 2)) %>%
  group_by(x_week) %>% summarise(y_avg = mean(y))

# rounding up instead of down
x <- data.frame(dt = lubridate::ymd_hms('20171021 160000',
                                        '20171021 163100'))
thicken(x, interval = "hour", rounding = "up")
thicken(x, interval = "hour", rounding = "up", ties_to_earlier = TRUE)

[Package padr version 0.6.2 Index]