R: Bayesian regression model fitting for point referenced...

Bsptime {bmstdr}

R Documentation

Bayesian regression model fitting for point referenced spatio-temporal data. Calculates parameter estimates, validation statistics, and estimated values of several Bayesian model choice criteria.

Description

Bayesian regression model fitting for point referenced spatio-temporal data. Calculates parameter estimates, validation statistics, and estimated values of several Bayesian model choice criteria.

Usage

Bsptime(
  formula,
  data,
  package = "none",
  model = "GP",
  coordtype = NULL,
  coords = NULL,
  validrows = NULL,
  scale.transform = "NONE",
  prior.beta0 = 0,
  prior.M = 1e-04,
  prior.sigma2 = c(2, 1),
  prior.tau2 = c(2, 0.1),
  prior.sigma.eta = c(2, 0.001),
  phi.s = NULL,
  phi.t = NULL,
  prior.phi = "Gamm",
  prior.phi.param = NULL,
  phi.tuning = NULL,
  phi.npoints = NULL,
  prior.range = c(1, 0.5),
  prior.sigma = c(1, 0.005),
  offset = c(10, 140),
  max.edge = c(50, 1000),
  rhotp = 0,
  time.data = NULL,
  truncation.para = list(at = 0, lambda = 2),
  newcoords = NULL,
  newdata = NULL,
  annual.aggrn = "NONE",
  cov.model = "exponential",
  g_size = NULL,
  knots.coords = NULL,
  tol.dist = 0.005,
  N = 2000,
  burn.in = 1000,
  rseed = 44,
  n.report = 2,
  no.chains = 1,
  ad.delta = 0.8,
  t.depth = 15,
  s.size = 0.01,
  verbose = FALSE,
  plotit = TRUE,
  mchoice = FALSE,
  ...
)

Arguments

`formula`	An object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted.
`data`	The data frame for which the model formula is to be fitted. The data frame should be in long format having one row for each location and time combination. The data frame must be ordered by time within each site, and should optionally have a column, named s.index, providing the site indices. Thus the data, with n sites and T times within each site, should be organized in the order: (s1, t1), (s1, t2), ... (s1, T), ... (sn, t1), ... (sn, T). The data frame should also contain two columns giving the coordinates of the locations for spatio temporal model fitting.
`package`	Which package is to be used in model fitting? Currently available packages are: "spBayes": The model implemented is the dynamic spatio-temporal model fitted using the `spDynLM` function in the `spBayes` package. "stan": The model implemented is the marginal independent GP model. "inla" The only model implemented is the AR model. "spTimer": All possible models in this package can be fitted. "sptDyn": All possible models in this package can be fitted. "none".: In this case case, the argument `model` must be specified either as "lm" or "separable". See below. Further details and more examples are provided in Chapters 7-9 of the book Sahu (2022).
`model`	The model to be fitted. This argument is passed to the fitting package. In case the package is none, then it can be either "lm" or "separable". The "lm" option is for an independent error regression model while the other option fits a separable model without any nugget effect. The separable model fitting method cannot handle missing data. All missing data points in the response variable will be replaced by the grand mean of the available observations.
`coordtype`	Type of coordinates: utm, lonlat or plain with utm (supplied in meters) as the default. Distance will be calculated in units of kilometer if this argument is either utm or lonlat. Euclidean distance will be calculated if this is given as the third type plain. If distance in meter is to be calculated then coordtype should be passed on as plain although the coords are supplied in UTM.
`coords`	A vector of size two identifying the two column numbers of the data frame to take as coordinates. Or this can be given as a matrix of number of sites by 2 providing the coordinates of all the data locations.
`validrows`	A vector of row numbers of the supplied data frame which should be used for validation. When the model is "separable" this argument must include all the time points for the sites to be validated. Otherwise, the user is allowed to select the row numbers of the data frame validation as they wish. The default NULL value instructs that validation will not be performed. lm model: If package is "none" and the model is "lm", this argument is a vector of row indices of the data frame which should be used for validation. separable model: If package is "none" and the model is "separable" this argument is a vector of site indices which should be used for validation. The "separable" model does not allow some sites to be used for both fitting and validation. Thus it is not possible to validate at selected time points using the separable model. Further details are provided in Chapter 7 of Sahu (2022).
`scale.transform`	Transformation of the response variable. It can take three values: SQRT, LOG or NONE. Default value is "NONE".
`prior.beta0`	A scalar value or a vector providing the prior mean for beta parameters.
`prior.M`	Prior precision value (or matrix) for beta. Defaults to a diagonal matrix with diagonal values 10^(-4).
`prior.sigma2`	Shape and scale parameter value for the gamma prior on 1/sigma^2, the precision.
`prior.tau2`	Shape and scale parameter value for the gamma prior on tau^2, the nugget effect.
`prior.sigma.eta`	Shape and scale parameter value for the inverse gamma prior distribution for sigma^2 eta; only used in the spBayes package.
`phi.s`	Only used if the model is "separable". The value of the fixed spatial decay parameter for the exponential covariance function. If this is not provided then a value is chosen which corresponds to an effective range which is the maximum distance between the data locations.
`phi.t`	Only used if the model is "separable". The fixed decay parameter for the exponential covariance function in the temporal domain. If this is not provided then a value is chosen which corresponds to an effective temporal range which is the maximum time of the data set.
`prior.phi`	Specifies the prior distribution for `\phi` only when package is one of Stan, spTimer or spTDyn. Distribution options uniform specified by "Unif" and gamma specified by "Gamm" have been implemented in both Stan and spTimer. Additionally a half-Cauchy prior distribution specified as "Cauchy" has been implemented in Stan. In the case of spTimer the uniform distribution is discrete while in the case of Stan the uniform distribution is continuous. In the case of spTimer the option "FIXED" can be used to keep the value fixed. In that case the fixed value can be given by by a scalar value as the argument `prior.phi.param` below or it can be left unspecified in which case the fixed value of `\phi` is chosen as 3/maximum distance between the data locations. The "FIXED" option is not available for the Stan package.
`prior.phi.param`	Lower and upper limits of the uniform prior distribution for phi the spatial decay parameter. For the default uniform distribution the values correspond to an effective range that is between 1% and 100% of the maximum distance between the data locations. For the Gamma distribution the default values are 2 and 1 and for the Cauchy distribution the default values are 0, 1 which specifies a half-Cauchy distribution in `(0, \infty)`.
`phi.tuning`	Only relevant for spTimer and spTDyn models. Tuning parameter fo sampling phi. See the help file for spT.Gibbs
`phi.npoints`	Only relevant for spTimer and spTDyn models. Number of points for the discrete uniform prior distribution on phi. See the help file for spT.Gibbs
`prior.range`	A length 2 vector, with (range0, Prange) specifying that `P(\rho < \rho_0)=p_{\rho}`, where `\rho` is the spatial range of the random field. If Prange is NA, then range0 is used as a fixed range value. If this parameter is unspecified then range0=0.90 * maximum distance and Prange =0.95. If instead a single value is specified then the range is set at the single value.
`prior.sigma`	A length 2 vector, with (sigma0, Psigma) specifying that `P(\sigma > \sigma_0)=p_{\sigma}`, where `\sigma` is the marginal standard deviation of the field. If Psigma is NA, then sigma0 is taken as the fixed value of this parameter.
`offset`	Only used in INLA based modeling. Offset parameter. See documentation for `inla.mesh.2d`.
`max.edge`	Only used in INLA based modeling. See documentation for `inla.mesh.2d`.
`rhotp`	Only relevant for models fitted by spTDyn. Initial value for the rho parameters in the temporal dynamic model. The default is rhotp=0 for which parameters are sampled from the full conditional distribution via MCMC with initial value 0. If rhotp=1,parameters are not sampled and fixed at value 1.
`time.data`	Defining the segments of the time-series set up using the function `spT.time`. Only used with the spTimer package.
`truncation.para`	Provides truncation parameter lambda and truncation point "at" using list. Only used with the spTimer package for a truncated model.
`newcoords`	The locations of the prediction sites in similar format to the `coords` argument, only required if fit and predictions are to be performed simultaneously. If omitted, no predictions will be performed.
`newdata`	The covariate values at the prediction sites specified by `newcoords`. This should have same space-time structure as the original data frame.
`annual.aggrn`	This provides the options for calculating annual summary statistics by aggregating different time segments (e.g., annual mean). Currently implemented options are: "NONE", "ave" and "an4th", where "ave" = annual average, "an4th"= annual 4th highest. Only applicable if spT.time inputs more than one segment and when fit and predict are done simultaneously. Only used in the spTimer package.
`cov.model`	Model for the covariance function. Only relevant for the spBayes, spTimer and the spTDyn packages. Default is the exponential model. See the documentation for `spLM` in the package spBayes.
`g_size`	Only relevant for GPP models fitted by either spTimer or spTDyn. The grid size c(m, n) for the knots for the GPP model. A square grid is assumed if this is passed on as a scalar. This does not need to be given if knots.coords is given instead.
`knots.coords`	Only relevant for GPP models fitted by either spTimer or spTDyn. Optional two column matrix of UTM-X and UTM-Y coordinates of the knots in kilometers. It is preferable to specify the g_size parameter instead.
`tol.dist`	Minimum separation distance between any two locations out of those specified by coords, knots.coords and pred.coords. The default is 0.005. The program will exit if the minimum distance is less than the non-zero specified value. This will ensure non-singularity of the covariance matrices.
`N`	MCMC sample size.
`burn.in`	How many initial iterations to discard. Only relevant for MCMC based model fitting, i.e., when package is spBayes or Stan.
`rseed`	Random number seed that controls the starting point for the random number stream. A set value is required to help reproduce the results.
`n.report`	How many times to report in MCMC progress. This is only used when the package is spBayes or spTimer.
`no.chains`	Number of parallel chains to run in Stan.
`ad.delta`	Adaptive delta controlling the behavior of Stan during fitting.
`t.depth`	Maximum allowed tree depth in the fitting process of Stan.
`s.size`	step size in the fitting process of Stan.
`verbose`	Logical scalar value: whether to print various estimates and statistics.
`plotit`	Logical scalar value: whether to plot the predictions against the observed values.
`mchoice`	Logical scalar value: whether model choice statistics should be calculated.
`...`	Any additional arguments that may be passed on to the fitting package.

Value

A list containing:

params - A table of parameter estimates
fit - The fitted model object. This is present only if a named package, e.g. spTimer has been used.
max.d - Maximum distance between data locations. This is in unit of kilometers unless the coordtype argument is set as plain.
fitteds - A vector of fitted values.
mchoice - Calculated model choice statistics if those have been requested by the input argument mchoice=TRUE. Not all model fits will contain all the model choice statistics.
stats - The four validation statistics: rmse, mae, crps and coverage. This is present only if model validation has been performed.
yobs_preds - A data frame containing the validation rows of the model fitting data frame. The last five columns of this data frame contains the validation prediction summaries: mean, sd, median, and 95% prediction interval. This is present only if model validation has been performed.
valpreds - A matrix containing the MCMC samples of the validation predictions. The dimension of this matrix is the number of validations times the number of retained MCMC samples. This is present only if model validation has been performed.
validationplots - Present only if validation has been performed. Contains three validation plots with or without segment and an ordinary plot. See obs_v_pred_plot for more.
residuals - A vector of residual values.
sn - The number of data locations used in fitting.
tn - The number of time points used in fitting.
phi.s, phi.t - Adopted value of the spatial and temporal decay parameters if those were fixed during model fitting.
prior.phi - If present this contains the name of the prior distribution for the spatial decay parameter phi used to fit the model.
prior.phi.param - If present this contains the values of the hyperparameters of the prior distribution for the spatial decay parameter phi.
prior.range - Present only if the INLA package has been used in model fitting. This contains the values of the hyperparameters of the prior distribution for the range.
logliks - A list containing the log-likelihood values used in calculation of the model choice statistics if those have been requested in the first place.
knots.coords - The locations of the knots if the model has been fitted using the GPP method.
formula - The input formula for the regression part of the model.
scale.transform - The transformation adopted by the input argument with the same name.
package - The name of the package used for model fitting.
model - The name of the fitted model.
call - The command used to call the model fitting function.
computation.time - Computation time required to run the model fitting.

References

Sahu SK (2022). Bayesian Modeling of Spatio Temporal Data with R, 1st edition. Chapman and Hall, Boca Raton. https://doi.org/10.1201/9780429318443.

Examples


# Set the total number of iterations 
N <- 45
# Set the total number of burn-in iterations 
burn.in <- 5
# How many times to report progress 
n.report <- 2
# Model formula used in most model fitting  
f2 <- y8hrmax ~ xmaxtemp + xwdsp + xrh
# Check out the data set 
head(nysptime)
## Fit linear regression model 
M1 <- Bsptime(model = "lm", data = nysptime, formula = f2,
    scale.transform = "SQRT", N = N, burn.in = burn.in, mchoice = TRUE)
names(M1)
plot(M1)
print(M1)
summary(M1)
a <- residuals(M1, numbers = list(sn = 28, tn = 62))
M2 <- Bsptime(model = "separable", data = nysptime, formula = f2,
    coordtype = "utm", coords = 4:5, mchoice = TRUE, scale.transform = "SQRT",
    N = N, burn.in = burn.in)
names(M2)
plot(M2)
print(M2)
summary(M2)
b <- residuals(M2)
# Spatio-temporal model fitting and validation
valids <- c(8, 11)
vrows <- which(nysptime$s.index %in% valids)
## Fit separable spatio-temporal model 
M2.1 <- Bsptime(model = "separable", formula = f2, data = nysptime,
    validrows = vrows, coordtype = "utm", coords = 4:5, phi.s = 0.005,
    phi.t = 0.05, scale.transform = "SQRT", N = N)
summary(M2.1)
plot(M2.1)
# Use spTimer to fit independent GP model 
M3 <- Bsptime(package = "spTimer", formula = f2, data = nysptime,
    coordtype = "utm", coords = 4:5, scale.transform = "SQRT", mchoice = TRUE,
    N = N, burn.in = burn.in, n.report = 2)
summary(M3)

valids <- c(1, 5, 10)
validt <- sort(sample(1:62, size = 31))
vrows <- getvalidrows(sn = 28, tn = 62, valids = valids, validt = validt)
ymat <- matrix(nysptime$y8hrmax, byrow = TRUE, ncol = 62)
yholdout <- ymat[valids, validt]
# Perform validation 
M31 <- Bsptime(package = "spTimer", formula = f2, data = nysptime,
    coordtype = "utm", coords = 4:5, validrows = vrows, model = "GP",
    scale.transform = "NONE", N = N, burn.in = burn.in, n.report = 2)
summary(M31)
modfit <- M31$fit
## Extract the fits for the validation sites
fitall <- data.frame(modfit$fitted)
head(fitall)
tn <- 62
fitall$s.index <- rep(1:28, each = tn)
library(spTimer)
vdat <- spT.subset(data = nysptime, var.name = c("s.index"), s = valids)
fitvalid <- spT.subset(data = fitall, var.name = c("s.index"), s = valids)
head(fitvalid)
fitvalid$low <- fitvalid$Mean - 1.96 * fitvalid$SD
fitvalid$up <- fitvalid$Mean + 1.96 * fitvalid$SD
fitvalid$yobs <- sqrt(vdat$y8hrmax)
fitvalid$yobs <- vdat$y8hrmax
yobs <- matrix(fitvalid$yobs, byrow = TRUE, ncol = tn)
y.valids.low <- matrix(fitvalid$low, byrow = TRUE, ncol = tn)
y.valids.med <- matrix(fitvalid$Mean, byrow = TRUE, ncol = tn)
y.valids.up <- matrix(fitvalid$up, byrow = TRUE, ncol = tn)
library(ggplot2)
p1 <- fig11.13.plot(yobs[1, ], y.valids.low[1, ], y.valids.med[1, ],
    y.valids.up[1, ], misst = validt)
p1 <- p1 + ggtitle("Validation for Site 1")
p1
p2 <- fig11.13.plot(yobs[2, ], y.valids.low[2, ], y.valids.med[2, ],
    y.valids.up[2, ], misst = validt)
p2 <- p2 + ggtitle("Validation for Site 5")
p2
p3 <- fig11.13.plot(yobs[3, ], y.valids.low[3, ], y.valids.med[3, ],
    y.valids.up[3, ], misst = validt)
p3 <- p3 + ggtitle("Validation for Site 10")
p3

## Independent marginal GP model fitting using rstan

M4 <- Bsptime(package = "stan", formula = f2, data = nysptime,
    coordtype = "utm", coords = 4:5, N = N, burn.in = burn.in,
    verbose = FALSE)
summary(M4)

# Spatio-temporal hierarchical auto-regressive modeling useing spTimer 
M5 <- Bsptime(package = "spTimer", model = "AR", formula = f2, data = nysptime,
    coordtype = "utm", coords = 4:5, scale.transform = "SQRT", mchoice = TRUE,
    n.report = n.report, N = N, burn.in = burn.in)
summary(M5)
a <- residuals(M5)

## Spatio-temporal dynamic model fitting using spTDyn
library(spTDyn)

f3 <- y8hrmax ~ xmaxtemp + sp(xmaxtemp) + tp(xwdsp) + xrh
M7 <- Bsptime(package = "sptDyn", model = "GP", formula = f3, data = nysptime,
    coordtype = "utm", coords = 4:5, scale.transform = "SQRT", mchoice = TRUE,
    N = N, burn.in = burn.in, n.report = n.report)
summary(M7)

# Dynamic Model fitting using spBayes 
M8 <- Bsptime(package = "spBayes", formula = f2, data = nysptime,
    prior.sigma2 = c(2, 25), prior.tau2 = c(2, 25), prior.sigma.eta = c(2,
        0.001), coordtype = "utm", coords = 4:5, scale.transform = "SQRT",
    N = N, burn.in = burn.in, n.report = n.report)
summary(M8)

## Gussian Predictive Process based model fitting using spTimer 
M9 <- Bsptime(package = "spTimer", model = "GPP", g_size = 5, formula = f2,
    data = nysptime, coordtype = "utm", coords = 4:5, scale.transform = "SQRT",
    N = N, burn.in = burn.in, n.report = n.report)
summary(M9)

# This INLA run may take a long time
if (require(INLA) & require(inlabru)) {
    f2 <- y8hrmax ~ xmaxtemp + xwdsp + xrh
    M6 <- Bsptime(package = "inla", model = "AR", formula = f2, data = nysptime,
        coordtype = "utm", coords = 4:5, scale.transform = "SQRT", 
        offset = c(100, 200), max.edge = c(500, 10000),
        mchoice = TRUE, plotit=TRUE)
    # Takes a minute
    summary(M6)
}

[Package bmstdr version 0.7.9 Index]