R: Sufficient statistics for the K-gaps model

kgaps_stat {exdex}

R Documentation

Sufficient statistics for the `K`-gaps model

Description

Calculates sufficient statistics for the K-gaps model for the extremal index \theta. Called by kgaps.

Usage

kgaps_stat(data, u, q_u, k = 1, inc_cens = TRUE)

Arguments

`data`	A numeric vector of raw data.
`u`	A numeric scalar. Extreme value threshold applied to data.
`q_u`	A numeric scalar. An estimate of the probability with which the threshold `u` is exceeded. If `q_u` is missing then it is calculated using `mean(data > u, na.rm = TRUE)`.
`k`	A numeric scalar. Run parameter `K`, as defined in Suveges and Davison (2010). Threshold inter-exceedances times that are not larger than `k` units are assigned to the same cluster, resulting in a `K`-gap equal to zero. Specifically, the `K`-gap `S` corresponding to an inter-exceedance time of `T` is given by `S = \max(T - K, 0)`.
`inc_cens`	A logical scalar indicating whether or not to include contributions from right-censored inter-exceedance times relating to the first and last observation. It is known that these times are greater than or equal to the time observed. See Attalides (2015) for details.

Details

The sample K-gaps are S_0, S_1, ..., S_{N-1}, S_N, where S_1, ..., S_{N-1} are uncensored and S_0 and S_N are right-censored. Under the assumption that the K-gaps are independent, the log-likelihood of the K-gaps model is given by

l(\theta; S_0, \ldots, S_N) = N_0 \log(1 - \theta) + 2 N_1 \log \theta - \theta q (S_0 + \cdots + S_N),

where

q is the threshold exceedance probability, estimated by the proportion of threshold exceedances,
N_0 is the number of uncensored sample K-gaps that are equal to zero,
(apart from an adjustment for the contributions of S_0 and S_N) N_1 is the number of positive sample K-gaps,
specifically, if inc_cens = TRUE then N_1 is equal to the number of S_1, ..., S_{N-1} that are positive plus (I_0 + I_N) / 2, where I_0 = 1 if S_0 is greater than zero and I_0 = 0 otherwise, and similarly for I_N.

The differing treatment of uncensored and right-censored K-gaps reflects differing contributions to the likelihood. Right-censored K-gaps that are equal to zero add no information to the likelihood. For full details see Suveges and Davison (2010) and Attalides (2015).

If N_1 = 0 then we are in the degenerate case where there is one cluster (all K-gaps are zero) and the likelihood is maximized at \theta = 0.

If N_0 = 0 then all exceedances occur singly (all K-gaps are positive) and the likelihood is maximized at \theta = 1.

Value

A list containing the sufficient statistics, with components

`N0`	the number of zero `K`-gaps.
`N1`	contribution from non-zero `K`-gaps (see Details).
`sum_qs`	the sum of the (scaled) `K`-gaps, that is, `q (S_0 + \cdots + S_N)`, where `q` is estimated by the proportion of threshold exceedances.
`n_kgaps`	the number of `K`-gaps that contribute to the log-likelihood.

References

Suveges, M. and Davison, A. C. (2010) Model misspecification in peaks over threshold analysis, Annals of Applied Statistics, 4(1), 203-221. doi:10.1214/09-AOAS292

Attalides, N. (2015) Threshold-based extreme value modelling, PhD thesis, University College London. https://discovery.ucl.ac.uk/1471121/1/Nicolas_Attalides_Thesis.pdf

Examples

u <- quantile(newlyn, probs = 0.90)
kgaps_stat(newlyn, u)

[Package exdex version 1.2.3 Index]

Sufficient statistics for the K-gaps model