R: Model Selection Criteria

MSClaio2008 {nsRFA}

R Documentation

Model Selection Criteria

Description

Model selection criteria for the frequency analysis of hydrological extremes, from Laio et al (2008).

Usage

 MSClaio2008 (sample, dist=c("NORM","LN","GUMBEL","EV2","GEV","P3","LP3"), 
              crit=c("AIC", "AICc", "BIC", "ADC"))
 ## S3 method for class 'MSClaio2008'
 print(x, digits=max(3, getOption("digits") - 3), ...)
 ## S3 method for class 'MSClaio2008'
 summary(object, ...)
 ## S3 method for class 'MSClaio2008'
 plot(x, ...)

Arguments

`sample`	data sample
`dist`	distributions: normal `"NORM"`, 2 parameter log-normal `"LN"`, Gumbel `"GUMBEL"`, Frechet `"EV2"`, Generalized Extreme Value `"GEV"`, Pearson type III `"P3"`, log-Pearson type III `"LP3"`
`crit`	Model-selection criteria: Akaike Information Criterion `"AIC"`, Akaike Information Criterion corrected `"AICc"`, Bayesian Information Criterion `"BIC"`, Anderson-Darling Criterion `"ADC"`
`x`	object of class `MSClaio2008`, output of `MSClaio2008()`
`object`	object of class `MSClaio2008`, output of `MSClaio2008()`
`digits`	minimal number of "significant" digits, see 'print.default'
`...`	other arguments

Details

The following lines are extracted from Laio et al. (2008). See the paper for more details and references.

Model selection criteria

The problem of model selection can be formalized as follows: a sample of n data, D=(x_1, \dots, x_n), arranged in ascending order is available, sampled from an unknown parent distribution f(x); N_m operating models, M_j, j=1,\dots, N_m, are used to represent the data. The operating models are in the form of probability distributions, M_j = g_j(x,\hat{\theta}), with parameters \hat{\theta} estimated from the available data sample D. The scope of model selection is to identify the model M_{opt} which is better suited to represent the data, i.e. the model which is closer in some sense to the parent distribution f(x).

Three different model selection criteria are considered here, namely, the Akaike Information Criterion (AIC), the Bayesian Information Criterion (BIC), and the Anderson-Darling Criterion (ADC). Of the three methods, the first two belong to the category of classical literature approaches, while the third derives from a heuristic interpretation of the results of a standard goodness-of-fit test (see Laio, 2004).

Akalike Information Criterion

The Akaike information Criterion (AIC) for the j-th operational model can be computed as

AIC_j = -2 ln (L_j(\hat{\theta})) + 2 p_j

where

L_j(\hat{\theta}) = \prod_{i=1}^n g_j(x_i, \hat{\theta})

is the likelihood function, evaluated at the point \theta=\hat{\theta} corresponding to the maximum likelihood estimator of the parameter vector \theta and p_j is the number of estimated parameter of the j-th operational model. In practice, after the computation of the AIC_j, for all of the operating models, one selects the model with the minimum AIC value, AIC_{min}.

When the sample size, n, is small, with respect to the number of estimated parameters, p, the AIC may perform inadequately. In those cases a second-order variant of AIC, called AICc, should be used:

AICc_j = -2 ln (L_j(\hat{\theta})) + 2 p_j (n/(n - p_j - 1))

Indicatively, AICc should be used when n/p < 40.

Bayesian Information Criterion

The Bayesian Information Criterion (BIC) for the j-th operational model reads

BIC_j = -2 ln (L_j(\hat{\theta})) + ln(n) p_j

In practical application, after the computation of the BIC_j, for all of the operating models, one selects the model with the minimum BIC value, BIC_{min}.

Anderson-Darling Criterion

The Anderson-Darling criterion has the form:

ADC_j = 0.0403 + 0.116 ((\Delta_{AD,j} - \epsilon_j)/\beta_j)^{(\eta_j/0.851)}

if 1.2 \epsilon_j < \Delta_{AD,j},

ADC_j = [0.0403 + 0.116 ((0.2 \epsilon_j)/\beta_j)^{(\eta_j/0.851)}] (\Delta_{AD,j} - 0.2 \epsilon_j / \epsilon_j)

if 1.2 \epsilon_j \ge \Delta_{AD,j}, where \Delta_{AD,j} is the discrepancy measure characterizing the criterion, the Anderson-Darling statistic A2 in GOFlaio2004, and \epsilon_j, \beta_j and \eta_j are distribution-dependent coefficients that are tabled by Laio [2004, Tables 3 and 5] for a set of seven distributions commonly employed for the frequency analysis of extreme events. In practice, after the computation of the ADC_j, for all of the operating models, one selects the model with the minimum ADC value, ADC_{min}.

Value

MSClaio2008 returns the value of the criteria crit (see Details) chosen applied to the sample, for every distribution dist.

plot.MSClaio2008 plots the empirical distribution function of sample (Weibull plotting position) on a log-normal probability plot, plots the candidate distributions dist (whose parameters are evaluated with the maximum likelihood technique, see MLlaio2004, and highlights the ones chosen by the criteria crit.)

Note

For information on the package and the Author, and for all the references, see nsRFA.

Examples

data(FEH1000)

sitedata <- am[am[,1]==53004, ] # data of site 53004
serieplot(sitedata[,4], sitedata[,3])
MSC <- MSClaio2008(sitedata[,4])
MSC
summary(MSC)
plot(MSC)

sitedata <- am[am[,1]==69023, ]	# data of site 69023
serieplot(sitedata[,4], sitedata[,3])
MSC <- MSClaio2008(sitedata[,4], crit=c("AIC", "ADC"))
MSC
summary(MSC)
plot(MSC)

sitedata <- am[am[,1]==83802, ] # data of site 83802
serieplot(sitedata[,4], sitedata[,3])
MSC <- MSClaio2008(sitedata[,4], dist=c("GEV", "P3", "LP3"))
MSC
summary(MSC)
plot(MSC)

# short sample, high positive L-CA
sitedata <- am[am[,1]==40012, ] # data of site 40012
serieplot(sitedata[,4], sitedata[,3])
MSC <- MSClaio2008(sitedata[,4])
MSC
summary(MSC)
plot(MSC)

# negative L-CA
sitedata <- am[am[,1]==68002, ] # data of site 68002
serieplot(sitedata[,4], sitedata[,3])
MSC <- MSClaio2008(sitedata[,4])
MSC
summary(MSC)
plot(MSC)

[Package nsRFA version 0.7-17 Index]