R: Mixed estimation of an ETAS model (renewed in version 2.0)

etasclass {etasFLP}

R Documentation

Mixed estimation of an ETAS model (renewed in version 2.0)

Description

etasclass is the main function of the package etasFLP.

etassclass objects of previous versions are not compatible with the current version

Performs the estimation of the components of the ETAS (Epidemic Type Aftershock Sequence) model for the description of the seismicity in a space-time region. Background seismicity is estimated non-parametrically, while triggered seismicity is estimated by MLE. In particular also the bandwidth for a kernel smoothing can be estimated through the Forward Likelihood Predictive (FLP) approach. For each event the probability of being a background event or a triggered one is estimated.

New in version 2.0.0: Covariates have been introduced to explain the effects of external factors on the induced seismicity. Since the parametrization is changed, the etasclass object created with the previous versions are not compatible with the one obtained with the current version.

New in version 2.2.0: New algoruthm for starting values. A new argument (n.iterweight) and an update method and a timeupdate option

An ETAS with up to 7+ncov parameters can be estimated, with several options and different methods.

Returns an etasclass object, for which plot, summary, print and profile methods are defined.

Usage

etasclass(cat.orig,
           time.update=FALSE,
           magn.threshold	=2.5,
           magn.threshold.back=magn.threshold+2,	
           tmax		=max(cat.orig$time),
           long.range=range(cat.orig$long),
           lat.range=range(cat.orig$lat),
           ##### starting values for parameters
           mu		=1,
           k0		=1,
           c		=0.5,
           p		=1.01,
           gamma	=.5,
           d		=1.,
           q		=1.5,
           betacov         =0.7,
           ### indicators: if params.ind[i] i-th parameter will be estimated
           params.ind=replicate(7,TRUE),
#           params.lim=c(0,0,0,1.0,0,0,0),
           ### formula for covariates (magnitude should always be included):
           formula1            ="time~magnitude-1",
           offset              =0,
           hdef=c(1,1),
           w		=replicate(nrow(cat.orig),1),
           hvarx  =replicate(nrow(cat.orig),1),
           hvary  =replicate(nrow(cat.orig),1),
           ### flags for the kind of declustering and smoothing:
           declustering	   =TRUE,
           thinning	       =FALSE,
           flp		         =TRUE,
           m1		           =NULL,
           ndeclust        =5,
           n.iterweight    =1,
           onlytime	=FALSE,
           is.backconstant	=FALSE,
           ##### end of  main input arguments. 
           ##### Control and secondary arguments:
           description	="",
           cat.back   	=NULL,
           back.smooth	=1.0,
           sectoday	=FALSE,
           longlat.to.km   =TRUE,
#           fastML=FALSE, #### not yet implemented
#           fast.eps=0.001, #### not yet implemented
           usenlm		=TRUE,
           method		="BFGS",
           compsqm 	=TRUE,
           epsmax		= 0.0001,
           iterlim		=50,      
           ntheta		=36)

Arguments

`cat.orig`	An earthquake catalog, possibly an object of class `eqcat`, or however a `data.frame` with variables of names `time`, `lat`, `long`, `z`, `magn1`. No missing values are allowed.
`time.update`	Logical. It is `TRUE` if the excution is called by `time.update` to update to new time maximum. Default value = `FALSE`.
`magn.threshold`	Threshold magnitude (only events with a magnitude at least `magn.threshold` will be used). Default value = 2.5.
`magn.threshold.back`	Threshold magnitude used to build the catalog `cat.back` for the first estimation of the background seismicity. Default value = `magn.threshold+2`.
`tmax`	Maximum value of time. Only observations before `tmax` will be used for estimation. Default value = `max(cat.orig$time)`.
`long.range`	Longitude range. Only observations with `long` in the range `long.range` will be used for estimation. Default value = `range(cat.orig$long)`.
`lat.range`	Latitude range. Only observations with `lat` in the range `lat.range` will be used for estimation. Default value = `range(cat.orig$lat)`.

Values for the 7 parameters of the ETAS model (starting values or fixed values according to params.ind):

`mu`	Parameter 1 (`\mu`) of the ETAS model: background general intensity; see details. Default value = 1.
`k0`	Parameter 2 (`\kappa_0`) of the ETAS model: measures the strength of the aftershock activity; see details. Default value = 1.
`c`	Parameter 3 of the ETAS model; a shift parameter of the Omori law for temporal decay rate of aftershocks; see details. Default value = 0.5.
`p`	Parameter 4 of the ETAS model; the exponent of the Omori law for temporal decay rate of aftershocks; see details. Default value = 1.01.
`gamma`	Parameter 5 (`\gamma`) of the ETAS model; together with `a` is related to the efficiency of an event of given magnitude in generating aftershocks; see details. Default value = 0.5.
`d`	Parameter 6 of the ETAS model; parameter related to the spatial influence of the mainshock; see details. Default value = 1.
`q`	Parameter 7 of the ETAS model; parameter related to the spatial influence of the mainshock; see details. Default value = 1.5.
`betacov`	Numerical array. Parameters of the covariates ETAS model (the parameters `\beta_j`); see details. Default value = 0.7. Parameters in `betacov` are not limited

End of model pararameter input

`params.ind`	vector of 7 logical values: `params.ind[i] = TRUE` means that the i-th parameter must be estimated. `params.ind[i] = FALSE` means that the i-th parameter is fixed to its input value (the order of parametrs is: `mu`, `k0`, `c`, `p`, `gamma`, `d`, `q`). Default value = `replicate(7,TRUE)`, that is, `etasclass` estimates all parameters.
`params.lim`	vector of 7 numerical values: `params.lim[i] = theta0` means that the i-th parameter must be greater than `theta0` (the default limits of parametrs are: 0 for `mu`, `k0`, `c`, 1 for `p`, 0 for `gamma`, `d`, `q`). Default value = `replicate(7,TRUE)`, that is, `etasclass` estimates all parameters.
`formula1`	a character variable: Formula which defines the covariates acting on the induced seismicity. In classical etas model the covariate is the magnitude. The left side (dummy) element must be the time, which is a variable certainly present in the data set. The right part of the formula determines `ncov` the number of covariates. Default value=`"time~magnitude-1"`; input must be a character value: it is converted in a formula inside the program
`offset`	An offset, for which no parameter will be estimated. Default value=0

Flags for the kind of declustering and smoothing:

`hdef`	Starting values for the `x,y` bandwidths used in the kernel estimator of background seismicity. Default value = `1,1`.
`w`	Starting values for the weigths used in the kernel estimator of background seismicity. The length must be equal to the number of events of the catalog after event selection (can be less than `nrow(cat.orig)`). Default value = `replicate(nrow(cat.orig),1).`
`hvarx`	Longitude bandwidths adjustement used in the kernel estimator of background seismicity. The length must be equal to the number of events of the catalog after event selection (can be less than `nrow(cat.orig)`). Default value = NULL
`hvary`	Longitude bandwidths adjustement used in the kernel estimator of background seismicity. The length must be equal to the number of events of the catalog after event selection (can be less than `nrow(cat.orig)`). Default value = NULL
`declustering`	if `TRUE` the catalog is iteratively declustered to optimally estimate the background intensity (through thinning, if `thinning=TRUE`, or through weighting if `thinning=FALSE`). Default value = `TRUE`.
`thinning`	if `thinning=TRUE` a background catalog is obtained sampling from the original catalog with probabilities estimated during the iterations. Default value =`FALSE`.
`flp`	if `flp=TRUE` then background seismicity is estimated through Forward Likelihood Predictive (see details). Otherwise the Silverman rule is used. Default value =`TRUE`.
`m1`	Used only if `flp=TRUE`. Indicates the range of points used for the FLP steps. See details. If missing it is set to `nrow(cat)/2`.
`ndeclust`	maximum number of iterations for the general declustering procedure. Default=5.
`n.iterweight`	New in version 2.2. The weighting and the density computations will be alternated `n.iterweight` times after each maximum likelihood step: in many situations this improves the general convergence procedure. Default=1.
`onlytime`	if `TRUE` then a time process is fitted to data , regardless to space location (in this case `is.backconstant` is set to `TRUE` and `declustering`, `flp` are set to `FALSE`). Default value = `FALSE`.
`is.backconstant`	if `TRUE` then background seismicity is assumed to be homogeneous in space (and `declustering, flp` are set to `FALSE`). Default value = `FALSE`.

Other control parameters:

`description`	a description string used for the output. Default value = "".
`cat.back`	external catalog used for the estimation of the background seismicity. Default value = `NULL`.
`back.smooth`	Controls the level of smoothing for the background seismicity (meaningful only if `flp=FALSE`). Default value = 1.
`sectoday`	if `TRUE`, then `time` variable of `cat.orig` is converted from seconds to days. Default value = `FALSE`.
`longlat.to.km`	if `TRUE`, then `long` and `lat` variables of `cat.orig` are treated as geographical coordinates and converted to kilometers. Default value = `TRUE`.
`usenlm`	if `TRUE`, then `nlm` function (gauss-newton method) is used in the maximum likelihood steps; if `FALSE`, then `optim` function is used (with `method` `=method` ). Default value = `TRUE`.
`method`	used if `usenlm=FALSE`: method used by `optim`. Default value = `"BFGS"`.
`compsqm`	if `TRUE`, then standard errors are computed. Default value = `TRUE`.
`epsmax`	maximum allowed difference between estimates in subsequent iterations (default = 0.0001).
`iterlim`	maximum number of iterations in the maximum likelihood steps (used in `nlm` or `optim`). Default value = 100.
`ntheta`	number of subdivisions of the round angle, used in the approximation of the integral involved in the likelihood computation of the ETAS model. Default value = 100.

Details

Estimates the components of an ETAS (Epidemic type aftershock sequence) model for the description of the seismicity of a space-time region. Background seismicity is estimated nonparametrically, while triggered seismicity is estimated by MLE.

From version 2.0 of package etasFLP covariates are allowed to improve the fitting of the triggered part, through the input formula1, which as a default values of "time ~ magnitude - 1", which corresponds to the previous version of package etasFLP, that is, magnitude as the only covariate which influence the average number of aftershocks.

The bandwidth of the kernel density estimator is estimated through the Forward Likelihood Predictive approach (FLP), (theoretical reference on Adelfio and Chiodi, 2013) if flp is set to TRUE. Otherwise the bandwidth is estimated trough the Silverman's rule. FLP steps for the estimation of nonparametric background component is alternated with the Maximum Likelihood step for the estimation of parametric components (only if declustering=TRUE). For each event the probability of being a background event or a triggered one is estimated, according to a declustering procedure in a way similar to the proposal of Zhuang, Ogata, and Vere-Jones (2002).

The ETAS model for conditional space time intensity \lambda(x,y,t) is given by:

\lambda(x,y,t)=\mu f(x,y)+\kappa_0 \sum_{t_j<t}\frac{ e^{\eta_j}}{(t-t_j +c)^p} \left\{ \frac{(x-x_j)^2+(y-y_j)^2}{e^{\gamma \ (m_j-m_0)}}+d \right\}^{-q}

where \eta_j=\sum_{j=1,ncov}\beta_j cov_{ij}

parameters \beta_j are the elements of the array variable betacov

f(x,y) is estimated through a weighted kernel gaussian estimator; if flp is set to TRUE then the bandwidth is estimated through a FLP step.

Weights (computed only if declustering=TRUE) are given by the estimated probabilities of being a background event; for the i-th event this is given by \rho_i=\frac{\mu f(x_i,y_i)}{\lambda(x_i,y_i,t_i)}. The weights \rho_i are updated after a whole iteration.

mu (\mu) measures the background general intensity (which is assumed temporally homogeneous);

k0 (\kappa_0) is a scale parameter related to the importance of the induced seismicity;

c and p are the characteristic parameters of the seismic activity of the given region; c is a shift parameter while p, which characterizes the pattern of seismicity, is the exponent parameter of the modified Omori law for temporal decay rate of aftershocks;

\eta_j=\sum_{j=1,ncov}\beta_j cov_{ij} measures the efficiency of an event of a given magnitude in generating aftershock sequences;

d and q are two parameters related to the spatial influence of the mainshocks.

Many kinds of ETAS models can be estimated, managing some control input arguments. The eight ETAS parameters can be fixed to some input value, or can be estimated, according to params.ind: if params.ind[i]=FALSE the i-th parameter is kept fixed to its input value, otherwise, if params.ind[i] = TRUE, the i-th parameter is estimated and the input value is used as a starting value.

By default params.ind=c(TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE), and so a full 7+ncov parameters ETAS model will be estimated.

The seven parameters are internally ordered in this way: params = (mu, k0, c, p, gamma, d, q); for example a model with a fixed value p=1 (and params.ind[4] = FALSE) can be estimated and compared with the model where p is estimated (params.ind[4]=TRUE);

for example a 6+ncov parameters model can be fitted with gamma=0 and params.ind[5]=FALSE, so that input must be in this case: params.ind=c(TRUE,TRUE,TRUE,TRUE,FALSE,TRUE,TRUE);

if onlytime=TRUE a time process is fitted to data (with a maximum of 5 parameters), regardless to space location (however the input catalog cat.orig must contain three columns named long, lat, z);

if is.backconstant=TRUE a process (space-time or time) with a constant background intensity \mu is fitted;

if mu is fixed to a very low value a process with very low background intensity is fitted, that is with only clustered intensity (useful to fit a model to a single cluster of events).

If flp=TRUE the bandwidth for the kernel estimation of the background intensity is evaluated maximizing the Forward Likelihood Predictive (FLP) quantity, given by (Chiodi, Adelfio, 2011; Adelfio, Chiodi, 2013):

FLP_{k_1,k_2}(\hat{\boldsymbol{\psi}})\equiv\sum_{k=k_1}^{n-1}\delta_{k,{k+1}}(\hat{\boldsymbol{\psi}}(H_{t_k}); H_{t_{k+1}})

with k_1=\frac{n}{2},k_2=n-1 and where \delta_{k,k+1}(\hat{\boldsymbol{\psi}}(H_{t_k}); H_{t_{k+1}}) is the predictive information of the first k observations on the k+1-th observation, and is so defined:

\delta_{k,k+1}(\hat{\boldsymbol{\psi}}(H_{t_k}); H_{t_{k+1}})\equiv \log L(\hat{\boldsymbol{\psi}}(H_{t_k}); H_{t_{k+1}} )-\log L(\hat{\boldsymbol{\psi}}(H_{t_k});H_{t_k})

where H_k is the history of the process until time t_k and \hat{\boldsymbol{\psi}}(H_{t_k}) is an estimate based only on history until the k-th observation.

In the ML step, the vector of parameter \theta=(\mu, \kappa_0, c , p, \alpha, \gamma, d, q) is estimated maximizing the sample log-likelihood given by:

\log L(\boldsymbol{\theta}; H_{t_n}) = \sum_{i=1}^{n} \log \lambda(x_i,y_i,t_i; \boldsymbol{\theta})- \int_{T_0}^{T_{max}} \int \int_{\Omega_{(x,y)}}\, \lambda(x,y,t;\boldsymbol{\theta})\,d x \, d y \,d t

Value

returns an object of class etasclass.

The main items of the output are:

`this.call`	reports the exact call of the function
`params.ind`	indicates which parameters have been estimated (see details)
`params`	ML estimates of the ETAS parameters.
`sqm`	Estimates of standard errors of the ML estimates of the ETAS parameters (`sqm[i]`=0 if `params.ind[i]`=`FALSE` or where the hessian is not computed or near to singularity).
`AIC.iter`	AIC values at each iteration.
`hdef`	final bandwidth used for the kernel estimation of background spatial intensity (however estimated, with `flp=TRUE` or `flp=FALSE`).
`rho.weights`	Estimated probability for each event to be a background event (`\rho`).
`time.res`	rescaled time residuals (for time processes only).
`params.iter`	A matrix with estimates values at each iteration.
`sqm.iter`	A matrix with the estimates of the standard errors at each iteration.
`rho.weights.iter`	A matrix with the values of `rho.weights` at each iteration.
`l`	A vector with estimated intensities, corresponding to observed points

summary, print and plot methods are defined for an object of class etasclass to obtain main output.

A profile method (profile.etasclass ) is also defined to make approximate inference on a single parameter

Note

In this version the x-y space region, where the point process is defined, is a rectangle embedding the catalog values.

The optimization algorithm depends on the choice of initial values. Some default guess choice is performed inside the function for parameters without input starting values; the function etas.starting gives rough first guess for initial values. If convergence problem are experienced, a useful strategy can be starting with an higher magnitude threshold value m_0 (that is, with a smaller catalog with bigger earthquakes), and then using this first output as starting guess for a running with a lower magnitude threshold value m_0. In this trial executions avoid declustering (declustering=FALSE) or at least use a small value of ndeclust; small values of iterlim and ntheta can speed first executions.

Quicker executions are obtained using smaller values of iterlim and ntheta in the input.

Also a first execution with is.backconstant = TRUE, to fit a first approximation model with constant background, can be useful.

Some other useful information can be obtained estimating a pure time process, that can give a good guess at least for some parameters, like \mu, \kappa_0, \alpha,c,p.

Input times are expected in days, and so final intensities are expected number of events per day. If input values are in seconds, then set sectoday=TRUE

Author(s)

Marcello Chiodi, Giada Adelfio

References

Adelfio, G. and Chiodi, M. (2013) Mixed estimation technique in semi-parametric space-time point processes for earthquake description. Proceedings of the 28th International Workshop on Statistical Modelling 8-13 July, 2013, Palermo (Muggeo V.M.R., Capursi V., Boscaino G., Lovison G., editors). Vol. 1 pp.65-70.

Adelfio G, Chiodi M (2015). Alternated Estimation in Semi-Parametric Space-Time Branching-Type Point Processes with Application to Seismic Catalogs. Stochastic Environmental Research and Risk Assessment, 29(2), 443-450. doi:10.1007/s00477-014-0873-8.

Adelfio G, Chiodi M (2015). FLP Estimation of Semi-Parametric Models for Space-Time Point Processes and Diagnostic Tools. Spatial Statistics, 14(B), 119-132. doi:10.1016/j.spasta.2015.06.004.

Adelfio G., Chiodi, M. (2020). Including covariates in a space-time point process with application to seismicity. Statistical Methods and Applications, doi:10.1007/s10260-020-00543-5.

Chiodi, M. and Adelfio, G., (2011) Forward Likelihood-based predictive approach for space-time processes. Environmetrics, vol. 22 (6), pp. 749-757. DOI:10.1002/env.1121.

Chiodi, M. and Adelfio, G., (2017) Mixed Non-Parametric and Parametric Estimation Techniques in R Package etasFLP for Earthquakes' Description. Journal of Statistical Software, vol. 76 (3), pp. 1-29. DOI: 10.18637/jss.v076.i03.

Zhuang, J., Ogata, Y. and Vere-Jones, D. Stochastic declustering of space-time earthquake occurrences. Journal of the American Statistical Association, 97, 369–379 (2002). DOI:10.1198/016214502760046925.

Examples

## Not run: 
data("italycatalog")
# load a sample catalog of the italian seismicity
esecov1<-etasclass(cat.orig = catalog.withcov, magn.threshold = 2.5, magn.threshold.back = 3.9, 
mu = 0.3, k0 = 0.02, c = 0.015, p = 0.99, gamma = 0, d = 1, 
q = 1.5, params.ind = c(TRUE, TRUE, TRUE, TRUE, FALSE, TRUE, 
TRUE), formula1 = "time ~  magnitude- 1", declustering = TRUE, 
thinning = FALSE, flp = TRUE, ndeclust = 15, onlytime = FALSE, 
is.backconstant = FALSE, sectoday = FALSE, usenlm = TRUE, 
compsqm = TRUE, epsmax = 1e-04, iterlim = 100, ntheta = 36)

# execution of etasclass for events with minimum magnitude of 3.0. 
# The events with magnitude at least 3.5 are used to build a first approximation
# for the background intensity function
# (magn.threshold.back=3.5)
# The magnitude effect is given by the covariate magnitude 
# in the formula "time ~  magnitude- 1"
# magnitude is the internal name for magn1-magn.threshold
# print method for the etasclass object

print(esecov1)

> print(esecov1)
Call: 
 
etasclass(cat.orig = catalog.withcov, magn.threshold = 2.5, magn.threshold.back = 3.9, 
    mu = 0.3, k0 = 0.02, c = 0.015, p = 0.99, gamma = 0, d = 1, 
    q = 1.5, params.ind = c(TRUE, TRUE, TRUE, TRUE, FALSE, TRUE, 
        TRUE), formula1 = "time ~  magnitude- 1", declustering = TRUE, 
    thinning = FALSE, flp = TRUE, ndeclust = 15, onlytime = FALSE, 
    is.backconstant = FALSE, sectoday = FALSE, usenlm = TRUE, 
    compsqm = TRUE, epsmax = 1e-04, iterlim = 100, ntheta = 36)

 
 
 
Number of observations             2226 
ETAS Parameters: 
        mu         k0          c          p      gamma          d          q 
  0.667509   0.022393   0.014769   1.110059   0.000000   1.905461   1.947223 
magnitude 
  0.740109 
# summary method for more informative output etasclass object

summary(esecov1)
# plot results with maps of intensities and diagnostic tools 

plot(esecov1)

## an application with 5 covariates
esecov5<-etasclass(cat.orig = catalog.withcov, magn.threshold = 2.5, magn.threshold.back = 3.9, 
mu = 0.3, k0 = 0.02, c = 0.015, p = 0.99, gamma = 0, d = 1, 
q = 1.5, params.ind = c(TRUE, TRUE, TRUE, TRUE, FALSE, TRUE, TRUE), 
formula1 = "time ~ z + magnitude +nstaloc_rev +min_distance_rev+distmin- 1", 
declustering = TRUE, thinning = FALSE, flp = TRUE, ndeclust = 15, 
onlytime = FALSE, is.backconstant = FALSE, sectoday = FALSE, 
usenlm = TRUE, compsqm = TRUE, epsmax = 1e-04, iterlim = 100, 
ntheta = 36)

## print results, more out put with summary
 print(esecov5)
Call: 
 
etasclass(cat.orig = catalog.withcov, magn.threshold = 2.5, magn.threshold.back = 3.9, 
    mu = 0.3, k0 = 0.02, c = 0.015, p = 0.99, gamma = 0, d = 1, 
    q = 1.5, params.ind = c(TRUE, TRUE, TRUE, TRUE, FALSE, TRUE, 
        TRUE), formula1 = "time ~ z + magnitude +nstaloc_rev +min_distance_rev+distmin- 1", 
    declustering = TRUE, thinning = FALSE, flp = TRUE, ndeclust = 15, 
    onlytime = FALSE, is.backconstant = FALSE, sectoday = FALSE, 
    usenlm = TRUE, compsqm = TRUE, epsmax = 1e-04, iterlim = 100, 
    ntheta = 36)

 
 
 
Number of observations             2226 
ETAS Parameters: 
              mu               k0                c                p 
        0.705351         0.073070         0.019396         1.154186 
           gamma                d                q                z 
        0.000000         1.942929         2.004915        -0.041256 
      magnitude      nstaloc_rev min_distance_rev          distmin 
        1.157698        -0.009010        -0.011020        -1.826717 


## End(Not run)

[Package etasFLP version 2.2.2 Index]