gevcdn.fit {GEVcdn} | R Documentation |
Fit a GEV CDN model
Description
Fit a GEV CDN model via nonlinear optimization of the generalized maximum likelihood cost function.
Usage
gevcdn.fit(x, y, iter.max = 1000, n.hidden = 2,
Th = gevcdn.logistic, fixed = NULL,
init.range = c(-0.25, 0.25),
scale.min = .Machine$double.eps, beta.p = 3.3,
beta.q = 2, sd.norm = Inf, n.trials = 5,
method = c("BFGS", "Nelder-Mead"), max.fails = 100, ...)
Arguments
x |
covariate matrix with number of rows equal to the number of samples and number of columns equal to the number of variables. |
y |
column matrix of target values with number of rows equal to the number of samples. |
iter.max |
maximum number of iterations of optimization algorithm. |
number of hidden nodes in the GEV CDN model. | |
Th |
hidden layer transfer function; defaults to |
fixed |
vector indicating GEV parameters to be held constant; elements chosen from |
init.range |
range for random weights on [ |
scale.min |
minimum allowable value for the GEV scale parameter. |
beta.p |
|
beta.q |
|
sd.norm |
|
n.trials |
number of repeated trials used to avoid shallow local minima during optimization |
method |
optimization method for |
max.fails |
maximum number of repeated exceptions allowed during optimization |
... |
additional arguments passed to the |
Details
Fit a nonstationary GEV CDN model (Cannon, 2010) by minimizing a cost function
based on the generalized maximum likelihood of Martins and Stedinger (2000).
The hidden layer transfer function Th
should be set to
gevcdn.logistic
for a nonlinear model and to
gevcdn.identity
for a linear model. In the nonlinear case, the
number of hidden nodes n.hidden
controls the overall complexity of the
model. GEV parameters can be held constant (i.e., stationary) via the
fixed
argument. The form of the shifted beta distribution prior for the
GEV shape parameter is controlled by the beta.p
and beta.q
arguments. By default, these are set to values used in Cannon (2010).
Other alternatives include values recommended by Martins and Stedinger (2000)
(beta.p = 9
and beta.q = 6
) or values following from
the normal distribution reported by Papalexiou and Koutsoyiannis (2013)
(beta.p = 71.1
and beta.q = 44.7
). Weight penalty
regularization for the magnitude of the input-hidden layer weights can
be applied by setting sd.norm
to a value less than Inf
.
Values of the Akaike information criterion (AIC), Akaike information criterion
with small sample size correction (AICc), and Bayesian information criterion
(BIC) are calculated to assist in model selection. It is possible for such
criteria to fail in the face of overfitting, for example with a nonlinear model
and n.hidden
set too high, as the GEV distribution may converge on one or
more samples. This can usually be diagnosed by inspecting the scale parameter
for near zero values. In this case, one can apply a weight penalty
(via sd.norm
), although this rules out the use of AIC/AICc/BIC for model
selection as the effective number of model parameters will no longer equal the
number of weights in the GEV CDN model. Alternatively, a lower threshold (as a
proportion of the range of y
) for the scale parameter can be set
via scale.min
. Finally, bootstrap aggregation is available via
gevcdn.bag
as a second method for fitting GEV CDN models.
Note: values of x
and y
need not be standardized or rescaled
by the user. All variables are automatically scaled to range between 0 and 1
prior to fitting and parameters are automatically rescaled by
gevcdn.evaluate
.
Value
a list consisting of
W1 |
input-hidden layer weights |
W2 |
hidden-output layer weights |
Attributes indicating the minimum/maximum values of x
and y
; the
values of Th
, fixed
, scale.min
; the negative of the
logarithm of the generalized maximum likelihood GML
, the negative of
the logarithm of the likelihood NLL
, the value of the penalty term
penalty
, the Bayesian information criterion BIC
, the Akaike
information criterion with (AICc
) and without (AIC
) small sample
size correction; and the number of model parameters k
are also
returned.
References
Cannon, A.J., 2010. A flexible nonlinear modelling framework for nonstationary generalized extreme value analysis in hydroclimatology. Hydrological Processes, 24: 673-685. DOI: 10.1002/hyp.7506
Martins, E.S. and J.R. Stedinger, 2000. Generalized maximum-likelihood generalized extreme-value quantile estimators for hydrologic data. Water Resources Research, 36:737-744. DOI: 10.1029/1999WR900330
Papalexiou, S.M. and Koutsoyiannis, D., 2013. Battle of extreme value distributions: A global survey on extreme daily rainfall. Water Resources Research, 49(1), 187-201. DOI: 10.1029/2012WR012557
See Also
gevcdn.cost
, gevcdn.evaluate
,
gevcdn.bag
, gevcdn.bootstrap
,
dgev
, optim
Examples
## Generate synthetic data, quantiles
x <- as.matrix(seq(0.1, 1, length = 50))
loc <- x^2
scl <- x/2
shp <- seq(-0.1, 0.3, length = length(x))
set.seed(100)
y <- as.matrix(rgev(length(x), location = loc, scale = scl,
shape = shp))
q <- sapply(c(0.1, 0.5, 0.9), qgev, location = loc, scale = scl,
shape = shp)
## Define a hierarchy of models of increasing complexity
models <- vector("list", 4)
# Stationary model
models[[1]] <- list(Th = gevcdn.identity,
fixed = c("location", "scale", "shape"))
# Linear model
models[[2]] <- list(Th = gevcdn.identity)
# Nonlinear, 1 hidden node
models[[3]] <- list(n.hidden = 1, Th = gevcdn.logistic)
# Nonlinear, 2 hidden nodes
models[[4]] <- list(n.hidden = 2, Th = gevcdn.logistic)
## Fit models
weights.models <- vector("list", length(models))
for(i in seq_along(models)){
weights.models[[i]] <- gevcdn.fit(x = x, y = y, n.trials = 1,
n.hidden = models[[i]]$n.hidden,
Th = models[[i]]$Th,
fixed = models[[i]]$fixed)
}
## Select model with minimum AICc
models.AICc <- sapply(weights.models, attr, which = "AICc")
weights.best <- weights.models[[which.min(models.AICc)]]
parms.best <- gevcdn.evaluate(x, weights.best)
## 10th, 50th, and 90th percentiles
q.best <- sapply(c(0.1, 0.5, 0.9), qgev,
location = parms.best[,"location"],
scale = parms.best[,"scale"],
shape = parms.best[,"shape"])
## Plot data and quantiles
matplot(x, cbind(y, q, q.best), type = c("b", rep("l", 6)),
lty = c(1, rep(c(1, 2, 1), 2)),
lwd = c(1, rep(c(3, 2, 3), 2)),
col = c("red", rep("orange", 3), rep("blue", 3)),
pch = 19, xlab = "x", ylab = "y", main = "gevcdn.fit")