R: Decluster Data Above a Threshold

decluster {extRemes}

R Documentation

Decluster Data Above a Threshold

Description

Decluster data above a given threshold to try to make them independent.

Usage

decluster(x, threshold, ...)

## S3 method for class 'data.frame'
decluster(x, threshold, ..., which.cols, method = c("runs", "intervals"), 
    clusterfun = "max")

## Default S3 method:
decluster(x, threshold, ..., method = c("runs", "intervals"),
    clusterfun = "max")

## S3 method for class 'intervals'
decluster(x, threshold, ..., clusterfun = "max", groups = NULL, replace.with, 
    na.action = na.fail)

## S3 method for class 'runs'
decluster(x, threshold, ..., data, r = 1, clusterfun = "max", groups = NULL, 
    replace.with, na.action = na.fail)

## S3 method for class 'declustered'
plot(x, which.plot = c("scatter", "atdf"), qu = 0.85, xlab = NULL, 
    ylab = NULL, main = NULL, col = "gray", ...)

## S3 method for class 'declustered'
print(x, ...)

Arguments

`x`	An R data set to be declustered. Can be a data frame or a numeric vector. If a data frame, then `which.cols` must be specified. `plot` and `print`: an object returned by `decluster`.
`data`	A data frame containing the data.
`threshold`	numeric of length one or the size of the data over which (non-inclusive) data are to be declustered.
`qu`	quantile for `u` argument in the call to `atdf`.
`which.cols`	numeric of length one or two. The first component tells which column is the one to decluster, and the second component tells which, if any, column is to serve as groups.
`which.plot`	character string naming the type of plot to make.
`method`	character string naming the declustering method to employ.
`clusterfun`	character string naming a function to be applied to the clusters (the returned value is used). Typically, for extreme value analysis (EVA), this will be the cluster maximum (default), but other options are ok as long as they return a single number.
`groups`	numeric of length `x` giving natural groupings that should be considered as separate clusters. For example, suppose data cover only summer months across several years. It would probably not make sense to decluster the data across years (i.e., a new cluster should be defined if they occur in different years).
`r`	integer run length stating how many threshold deficits should be used to define a new cluster.
`replace.with`	number, NaN, Inf, -Inf, or NA. What should the remaining values in the cluster be replaced with? The default replaces them with `threshold`, which for most EVA purposes is ideal.
`na.action`	function to be called to handle missing values.
`xlab`, `ylab`, `main`, `col`	optioal arguments to the `plot` function. If not used, then reasonable default values are used.
`...`	optional arguments to `decluster.runs` or `clusterfun`. `plot`: optional arguments to `plot`. Not used by `print`.

Details

Runs declustering (see Coles, 2001 sec. 5.3.2): Extremes separated by fewer than r non-extremes belong to the same cluster.

Intervals declustering (Ferro and Segers, 2003): Extremes separated by fewer than r non-extremes belong to the same cluster, where r is the nc-th largest interexceedance time and nc, the number of clusters, is estimated from the extremal index, theta, and the times between extremes. Setting theta = 1 causes each extreme to form a separate cluster.

The print statement will report the resulting extremal index estimate based on either the runs or intervals estimate depending on the method argument as well as the number of clusters and run length. For runs declustering, the run length is the same as the argument given by the user, and for intervals method, it is an estimated run length for the resulting declustered data. Note that if the declustered data are independent, the extremal index should be close to one (if not equal to 1).

Value

A numeric vector of class “declustered” is returned with various attributes including:

`call`	the function call.
`data.name`	character string giving the name of the data.
`decluster.function`	value of `clusterfun` argument. This is a function.
`method`	character string naming the method. Same as input argument.
`threshold`	threshold used for declustering.
`groups`	character string naming the data used for the groups when applicable.
`run.length`	the run length used (or estimated if “intervals” method employed).
`na.action`	function used to handle missing values. Same as input argument.
`clusters`	muneric giving the clusters of threshold exceedances.

Author(s)

Eric Gilleland

References

Coles, S. (2001) An introduction to statistical modeling of extreme values, London, U.K.: Springer-Verlag, 208 pp.

Ferro, C. A. T. and Segers, J. (2003). Inference for clusters of extreme values. Journal of the Royal Statistical Society B, 65, 545–556.

Examples

y <- rnorm(100, mean=40, sd=20)
y <- apply(cbind(y[1:99], y[2:100]), 1, max)
bl <- rep(1:3, each=33)

ydc <- decluster(y, quantile(y, probs=c(0.75)), r=1, groups=bl)
ydc

plot(ydc)

## Not run: 
look <- decluster(-Tphap$MinT, threshold=-73)
look
plot(look)

# The code cannot currently grab data of the type of above.
# Better:
y <- -Tphap$MinT
look <- decluster(y, threshold=-73)
look
plot(look)

# Even better.  Use a non-constant threshold.
u <- -70 - 7 *(Tphap$Year - 48)/42
look <- decluster(y, threshold=u)
look
plot(look)

# Better still: account for the fact that there are huge
# gaps in data from one year to another.
bl <- Tphap$Year - 47
look <- decluster(y, threshold=u, groups=bl)
look
plot(look)


# Now try the above with intervals declustering and compare 
look2 <- decluster(y, threshold=u, method="intervals", groups=bl)
look2
dev.new()
plot(look2)
# Looks about the same,
# but note that the run length is estimated to be 5.
# Same resulting number of clusters, however.
# May result in different estimate of the extremal
# index.


#
fit <- fevd(look, threshold=u, type="GP", time.units="62/year")
fit
plot(fit)

# cf.
fit2 <- fevd(-MinT~1, Tphap, threshold=u, type="GP", time.units="62/year")
fit2
dev.new()
plot(fit2)

#
fit <- fevd(look, threshold=u, type="PP", time.units="62/year")
fit
plot(fit)

# cf.
fit2 <- fevd(-MinT~1, Tphap, threshold=u, type="PP", time.units="62/year")
fit2
dev.new()
plot(fit2)



## End(Not run)

[Package extRemes version 2.1-4 Index]