R: Calculate standardised indices

std_index {SEI}

R Documentation

Calculate standardised indices

Description

Inputs a time series of a chosen variable (e.g. precipitation, energy demand, residual load etc.) and returns a time series of standardised indices. Indices can be calculated on any timescale.

Usage

std_index(
  x_new,
  x_ref = x_new,
  timescale = NULL,
  dist = "empirical",
  return_fit = FALSE,
  moving_window = NULL,
  window_scale = NULL,
  agg_period = NULL,
  agg_scale = NULL,
  agg_fun = "sum",
  rescale = NULL,
  rescale_fun = "sum",
  index_type = "normal",
  ignore_na = FALSE
)

Arguments

`x_new`	numeric; vector or time series to be converted to standardised indices.
`x_ref`	numeric; vector or time series containing reference data to use when calculating the standardised indices.
`timescale`	string; timescale of the data. Required if the time series is to be aggregated or rescaled.
`dist`	string; distribution used to calculate the indices.
`return_fit`	logical; return parameters and goodness-of-fit statistics for the distribution fit.
`moving_window`	numeric; length of moving window on which to calculate the indices.
`window_scale`	string; timescale of `moving_window`, default is the timescale of the data.
`agg_period`	numeric; the number of values to aggregate over.
`agg_scale`	string; timescale of `agg_period`, default is the timescale of the data.
`agg_fun`	string; function used to aggregate the data over the aggregation period, default is "sum".
`rescale`	string; the timescale that the time series should be rescaled to.
`rescale_fun`	string; function used to rescale the data, default is "sum".
`index_type`	string; the type of index: "normal" (default), "probability", or "bounded".
`ignore_na`	logical; ignore NAs when rescaling the time series.

Details

Standardised indices are calculated by estimating the cumulative distribution function (CDF) of the variable of interest, and using this to transform the measurements to a standardised scale.

std_index() estimates the CDF using a time series of reference data x_ref, and applies the resulting transformation to the time series x_new. The result is a time series of standardised x_new values. These standardised indices quantify how extreme the x_new values are in reference to x_ref. x_new and x_ref should therefore contain values of the same variable. If x_ref is not specified, then x_new is also used to estimate the CDF.

x_new and x_ref can either be provided as vectors or xts time series. In the latter case, the time series can be aggregated across timescales or rescaled. This is useful, for example, if x_new contains hourly data, but interest is on daily accumulations or averages of the hourly data.

The argument rescale converts the data to a different timescale. The original timescale of the data can be manually specified using the argument timescale. Otherwise, the function will try to automatically determine the timescale of the data. Manually specifying the timescale of the data is generally more robust. The rescaling is performed using the function rescale_fun. By default, this is assumed to be rescale_fun = "sum", so that values are added across the timescale of interest. This can be changed to any user-specified function.

The argument agg_period aggregates the data across the timescale of interest. This differs from rescale in that the resolution of the data remains the same. agg_period is a number specifying how long the data should be aggregated across. By default, it is assumed that agg_period is on the same timescale as x_new and x_ref. For example, if the data is hourly and agg_period = 24, then this assumes the data is to be aggregated over the past 24 hours. The scale of the aggregation period can also be specified manually using agg_scale. For example, one could also specify agg_period = 1 and agg_scale = "days", and this would also aggregate the data over the past day. agg_fun specifies how the data is to be aggregated, the default is agg_fun = "sum".

timescale, agg_scale, and rescale must all be one of: "days", "weeks", "months", "quarters", and "years".

dist is the distribution used to estimate the CDF from x_ref. Currently, functionality is available to fit one of the following distributions to the data: Normal ('norm'), Log-normal ('lnorm'), Logistic ('logis'), Log-logistic ('llogis'), Exponential ('exp'), Gamma ('gamma'), and Weibull ('weibull'). Alternatively, the CDF can be estimated empirically (dist = "empirical") based on the values in x_ref, or using kernel density estimation (dist = "kde").

If dist is a parametric family of distributions, then parameters of the distribution are estimated using maximum likelihood estimation from x_ref. The resulting parameters and corresponding goodness-of-fit statistics can be returned by specifying return_fit = TRUE.

By default, the distribution is estimated over all values in x_ref. Alternatively, if x_new is an xts object, parameters can be estimated sequentially using a moving window of values. moving_window determines the length of the moving window. This is a single value, assumed to be on the same timescale as x_new. This can also be specified manually using window_scale. window_scale must also be one of "days", "weeks", "months", "quarters", and "years".

The function returns a vector of time series (depending on the format of x_new) containing the standardised indices corresponding to x_new. Three different types of indices are available, which are explained in detail in the vignette. The index type can be chosen using index_type, which must be one of "normal" (default), "probability", and "bounded".

Value

Time series of standardised indices.

Author(s)

Sam Allen, Noelia Otero

References

Allen, S. and N. Otero (2023): ‘Standardised indices to monitor energy droughts’, Renewable Energy 217, 119206 doi:10.1016/j.renene.2023.119206

McKee, T. B., Doesken, N. J., & Kleist, J. (1993): ‘The relationship of drought frequency and duration to time scales’, In Proceedings of the 8th Conference on Applied Climatology 17, 179-183.

Examples

data(data_supply)
# consider hourly German energy supply data in 2019
supply_de <- subset(data_supply, country == "Germany", select = c("date", "PWS"))
supply_de <- xts::xts(supply_de$PWS, order.by = supply_de$date)
#options(xts_check_TZ = FALSE)

# convert to hourly standardised indices
supply_de_std <- std_index(supply_de, timescale = "hours")
hist(supply_de, main = "Raw values")
hist(supply_de_std, main = "Standardised values")

# convert to daily or weekly standardised indices
supply_de_std <- std_index(supply_de, timescale = "hours", rescale = "days")

# convert to weekly standardised indices calculated on each day
supply_de_std <- std_index(supply_de, timescale = "hours", rescale = "days",
                           agg_period = 1, agg_scale = "weeks")

# calculate standardised indices corresponding to December, based on the previous year
dec <- zoo::index(supply_de) > "2019-12-01 UTC"
supply_de_std_dec <- std_index(x_new = supply_de[dec], x_ref = supply_de[!dec],
                               timescale = "hours")

# calculate standardised indices using a 100 day moving window
supply_de_std_dec <- std_index(supply_de[dec], supply_de, timescale = "hours",
                               rescale = "days", moving_window = 100)

# suppose we are interested in the daily maximum rather than the daily total
supply_de_std <- std_index(supply_de, timescale = "hours", rescale = "days",
                           rescale_fun = "max")
supply_de_std <- std_index(supply_de, timescale = "hours", rescale = "days",
                           rescale_fun = "mean") # or average

# the default uses the empirical distribution, but this requires more data than
# parametric distributions, meaning it is not ideal when data is short, e.g. in weekly case
supply_de_std <- std_index(supply_de, timescale = "hours", rescale = "weeks") # warning
# instead, we can use a parametric distribution, e.g. a gamma distribution
supply_de_std <- std_index(supply_de, timescale = "hours", rescale = "weeks", dist = "gamma")
# we can check the fit by checking whether the indices resemble a standard normal distribution
hist(supply_de)
hist(supply_de_std)
# we can also look at the properties of the fit
supply_de_std <- std_index(supply_de, timescale = "hours", rescale = "weeks",
                           dist = "gamma", return_fit = TRUE)

# we could also use kernel density estimation, which is a flexible compromise between the two
supply_de_std <- std_index(supply_de, timescale = "hours", rescale = "weeks", dist = "kde")

[Package SEI version 0.1.1 Index]