std_index {SEI} | R Documentation |
Calculate standardised indices
Description
Inputs a time series of a chosen variable (e.g. precipitation, energy demand, residual load etc.) and returns a time series of standardised indices. Indices can be calculated on any timescale.
Usage
std_index(
x_new,
x_ref = x_new,
timescale = NULL,
dist = "empirical",
return_fit = FALSE,
moving_window = NULL,
window_scale = NULL,
agg_period = NULL,
agg_scale = NULL,
agg_fun = "sum",
rescale = NULL,
rescale_fun = "sum",
index_type = "normal",
ignore_na = FALSE
)
Arguments
x_new |
numeric; vector or time series to be converted to standardised indices. |
x_ref |
numeric; vector or time series containing reference data to use when calculating the standardised indices. |
timescale |
string; timescale of the data. Required if the time series is to be aggregated or rescaled. |
dist |
string; distribution used to calculate the indices. |
return_fit |
logical; return parameters and goodness-of-fit statistics for the distribution fit. |
moving_window |
numeric; length of moving window on which to calculate the indices. |
window_scale |
string; timescale of |
agg_period |
numeric; the number of values to aggregate over. |
agg_scale |
string; timescale of |
agg_fun |
string; function used to aggregate the data over the aggregation period, default is "sum". |
rescale |
string; the timescale that the time series should be rescaled to. |
rescale_fun |
string; function used to rescale the data, default is "sum". |
index_type |
string; the type of index: "normal" (default), "probability", or "bounded". |
ignore_na |
logical; ignore NAs when rescaling the time series. |
Details
Standardised indices are calculated by estimating the cumulative distribution function (CDF) of the variable of interest, and using this to transform the measurements to a standardised scale.
std_index()
estimates the CDF using a time series of reference data x_ref
,
and applies the resulting transformation to the time series x_new
. The result is
a time series of standardised x_new
values. These standardised indices quantify
how extreme the x_new
values are in reference to x_ref
.
x_new
and x_ref
should therefore contain values of the same variable.
If x_ref
is not specified, then x_new
is also used to estimate the CDF.
x_new
and x_ref
can either be provided as vectors or xts time series.
In the latter case, the time series can be aggregated across timescales or rescaled.
This is useful, for example, if x_new
contains hourly data, but interest is
on daily accumulations or averages of the hourly data.
The argument rescale
converts the data to a different timescale. The original
timescale of the data can be manually specified using the argument timescale
.
Otherwise, the function will try to automatically determine the timescale of the data.
Manually specifying the timescale of the data is generally more robust. The rescaling
is performed using the function rescale_fun
. By default, this is assumed to be
rescale_fun = "sum"
, so that values are added across the timescale of interest.
This can be changed to any user-specified function.
The argument agg_period
aggregates the data across the timescale of interest.
This differs from rescale
in that the resolution of the data remains the same.
agg_period
is a number specifying how long the data should be aggregated across.
By default, it is assumed that agg_period
is on the same timescale as x_new
and x_ref
. For example, if the data is hourly and agg_period = 24
, then
this assumes the data is to be aggregated over the past 24 hours. The scale of the
aggregation period can also be specified manually using agg_scale
. For example,
one could also specify agg_period = 1
and agg_scale = "days"
, and this
would also aggregate the data over the past day. agg_fun
specifies how the
data is to be aggregated, the default is agg_fun = "sum"
.
timescale
, agg_scale
, and rescale
must all be one of: "days",
"weeks", "months", "quarters", and "years".
dist
is the distribution used to estimate the CDF from x_ref
.
Currently, functionality is available to fit one of the following distributions to the data:
Normal ('norm'), Log-normal ('lnorm'), Logistic ('logis'), Log-logistic ('llogis'),
Exponential ('exp'), Gamma ('gamma'), and Weibull ('weibull').
Alternatively, the CDF can be estimated empirically (dist = "empirical"
)
based on the values in x_ref
, or using kernel density estimation (dist = "kde"
).
If dist
is a parametric family of distributions, then parameters of the
distribution are estimated using maximum likelihood estimation from x_ref
.
The resulting parameters and corresponding goodness-of-fit statistics can be
returned by specifying return_fit = TRUE
.
By default, the distribution is estimated over all values in x_ref
. Alternatively,
if x_new
is an xts object, parameters can be estimated sequentially using a
moving window of values. moving_window
determines the length of the moving window.
This is a single value, assumed to be on the same timescale as x_new
.
This can also be specified manually using window_scale
. window_scale
must also be one of "days", "weeks", "months", "quarters", and "years".
The function returns a vector of time series (depending on the format of x_new
)
containing the standardised indices corresponding to x_new
. Three different
types of indices are available, which are explained in detail in the vignette.
The index type can be chosen using index_type
, which must be one of
"normal" (default), "probability", and "bounded".
Value
Time series of standardised indices.
Author(s)
Sam Allen, Noelia Otero
References
Allen, S. and N. Otero (2023): ‘Standardised indices to monitor energy droughts’, Renewable Energy 217, 119206 doi:10.1016/j.renene.2023.119206
McKee, T. B., Doesken, N. J., & Kleist, J. (1993): ‘The relationship of drought frequency and duration to time scales’, In Proceedings of the 8th Conference on Applied Climatology 17, 179-183.
Examples
data(data_supply)
# consider hourly German energy supply data in 2019
supply_de <- subset(data_supply, country == "Germany", select = c("date", "PWS"))
supply_de <- xts::xts(supply_de$PWS, order.by = supply_de$date)
#options(xts_check_TZ = FALSE)
# convert to hourly standardised indices
supply_de_std <- std_index(supply_de, timescale = "hours")
hist(supply_de, main = "Raw values")
hist(supply_de_std, main = "Standardised values")
# convert to daily or weekly standardised indices
supply_de_std <- std_index(supply_de, timescale = "hours", rescale = "days")
# convert to weekly standardised indices calculated on each day
supply_de_std <- std_index(supply_de, timescale = "hours", rescale = "days",
agg_period = 1, agg_scale = "weeks")
# calculate standardised indices corresponding to December, based on the previous year
dec <- zoo::index(supply_de) > "2019-12-01 UTC"
supply_de_std_dec <- std_index(x_new = supply_de[dec], x_ref = supply_de[!dec],
timescale = "hours")
# calculate standardised indices using a 100 day moving window
supply_de_std_dec <- std_index(supply_de[dec], supply_de, timescale = "hours",
rescale = "days", moving_window = 100)
# suppose we are interested in the daily maximum rather than the daily total
supply_de_std <- std_index(supply_de, timescale = "hours", rescale = "days",
rescale_fun = "max")
supply_de_std <- std_index(supply_de, timescale = "hours", rescale = "days",
rescale_fun = "mean") # or average
# the default uses the empirical distribution, but this requires more data than
# parametric distributions, meaning it is not ideal when data is short, e.g. in weekly case
supply_de_std <- std_index(supply_de, timescale = "hours", rescale = "weeks") # warning
# instead, we can use a parametric distribution, e.g. a gamma distribution
supply_de_std <- std_index(supply_de, timescale = "hours", rescale = "weeks", dist = "gamma")
# we can check the fit by checking whether the indices resemble a standard normal distribution
hist(supply_de)
hist(supply_de_std)
# we can also look at the properties of the fit
supply_de_std <- std_index(supply_de, timescale = "hours", rescale = "weeks",
dist = "gamma", return_fit = TRUE)
# we could also use kernel density estimation, which is a flexible compromise between the two
supply_de_std <- std_index(supply_de, timescale = "hours", rescale = "weeks", dist = "kde")