intervalaverage {intervalaverage}R Documentation

time-weighted average of values measured over intervals

Description

intervalaverage takes values recorded over non-overlapping intervals and averages them to defined intervals, possibly within groups (individuals/monitors/locations/etc). This function could be used to take averages over long intervals of values measured over short intervals and/or to take short "averages" of values measured over longer intervals (ie, downsample without smoothing). Measurement intervals and averaging intervals need not align. In the event that an averaging interval contains more than one measurement interval, a weighted average is calculated (ie each measurement is weighted on the duration of its interval's overlap with the averaging period interval).

Usage

intervalaverage(
  x,
  y,
  interval_vars,
  value_vars,
  group_vars = NULL,
  required_percentage = 100,
  skip_overlap_check = FALSE,
  verbose = FALSE
)

Arguments

x

a data.table containing values measured over intervals. see interval_vars parameter for how to specify interval columns and value_vars for how to specify value columns. intervals in x must must be completely non-overlapping within groups defined by group_vars. if group_vars is specified (non-NULL), x must also contain columns specified in group_vars.

y

a data.table object containing intervals over which averages of x values should be computed. averaging intervals in y, unlike measurement intervals in x, may be overlapping within groups. if group_vars is specified (non-NULL), y must contains those group_vars column names (and this would allow different averaging periods for each group)

interval_vars

a length-2 character vector of column names in both x and y. These column names specify columns in x and y that define closed (inclusive) starting and ending intervals. The column name specifying the lower-bound column must be specified first. these columns in x and y must all be of the same class and either be integer or IDate. The interval_vars character vector cannot be named. This is reserved for future use allowing different interval_vars column names in x and y.

value_vars

a character vector of column names in x. This specifies the columns to be averaged.

group_vars

A character vector of column names in both x and y. The interaction of these variables define groups in which averages of x values will be taken. specifying subjects/monitors/locations within which to take averages. By default this is NULL, in which case averages are taken over the entire x dataset for each y period. The group_vars character vector cannot be named. This is reserved for future use allowing different interval_vars column names in x and y.

required_percentage

This percentage of the duration of each (possibly group-specific) y interval must be observed and nonmissing for a specific value_var in x in order for the return table to contain a nonmissing average of the value_var for that y interval. If the percentage of the nonmissing value_var observations is less than required_percentage an NA will be be returned for that average. The default is 100, meaning that if any portion of a y interval is either not recorded or missing in x, then the corresponding return row will contain a an NA for the average of that value_var.

skip_overlap_check

by default, FALSE. setting this to TRUE will skip internal checks to make sure x intervals are non-overlapping within groups defined by group_vars. intervals in x must be non-overlapping, but you may want to skip this check if you've already checked this because it is computationally intensive for large datasets.

verbose

include printed timing information? by default, FALSE

Details

All intervals are treated as closed (ie inclusive of the start and end values in interval_vars)

x and y are not copied but rather passed by reference to function internals but the order of these data.tables is restored on function completion or error,

When required_percentage is less than 100, xminstart and xmaxend may be useful to determine whether an average meets specified coverage requirements in terms of not just percent of missingness but whether values are represented through the range of the y interval

Value

returns a data.table object. Rows of the return data.table correspond to intervals from y. i.e, the number of rows of the return will be the number of rows of y. Columns of the returned data.table are as follows:

Examples

x <- data.table(start=seq(1L,by=7L,length=6),
               end=seq(7L,by=7L,length=6),
               pm25=c(10,12,8,14,22,18))

y <- data.table(start=seq(3L,by=7L,length=6),
               end=seq(9L,by=7L,length=6))

z <- intervalaverage(x,y,interval_vars=c("start","end"),
                    value_vars=c("pm25"))

#also see vignette for more extensive examples

[Package intervalaverage version 0.8.0 Index]