bootstrap.geoGAM {geoGAM}R Documentation

Bootstrapped predictive distribution

Description

Method for class geoGAM to compute model based bootstrap for point predictions. Returns complete predictive distribution of which prediction intervals can be computed.

Usage

## Default S3 method:
bootstrap(object, ...)

## S3 method for class 'geoGAM'
bootstrap(object, newdata, R = 100,
          back.transform = c("none", "log", "sqrt"),
          seed = NULL, cores = detectCores(), ...)

Arguments

object

geoGAM object

newdata

data frame in which to look for covariates with which to predict.

R

number of bootstrap replicates, single positive integer.

back.transform

sould to log or sqrt transformed responses unbiased back transformation be applied? Default is none.

seed

seed for simulation of new response. Set seed for reproducible results.

cores

number of cores to be used for parallel computing.

...

further arguments.

Details

Soil properties are predicted for new locations \mathbf{s_{+}} from the final geoGAM fit by \tilde{Y}(\mathbf{s_+})=\hat f(\mathbf{x}\mathbf{(s_+)}), see function predict.geoGAM. To model the predictive distributions for continuous responses bootstrap.geoGAM uses a non-parametric, model-based bootstrapping approach (Davison and Hinkley 1997, pp. 262, 285) as follows:

  1. New values of the response are simulated according to Y(\mathbf{s})^{*} = \hat f(\mathbf{x}(\mathbf{s}))+\mathbf{\epsilon}, where \hat f(\mathbf{x}(\mathbf{s})) are the fitted values of the final model and \epsilon are errors randomly sampled with replacement from the centred, homoscedastic residuals of the final model Wood 2006, p. 129).

  2. geoGAM is fitted to Y(\mathbf{s})^{*}.

  3. Prediction errors are computed according to \delta_{+}^{*} = \hat f(\mathbf{x}(\mathbf{s_{+}}))^{*} - (\, \hat f(\mathbf{x}(\mathbf{s_{+}})) + \mathbf{\epsilon} \,), where \hat f(\mathbf{x}(\mathbf{s_{+}}))^{*} are predicted values at new locations \mathbf{s_{+}} of the model built with the simulated response Y(\mathbf{s})^{*} in step B above, and the errors \epsilon are again randomly sampled from the centred, homoscedastic residuals of the final model (see step A).

Prediction intervals are computed according to

[\hat f(\mathbf{x}(\mathbf{s_{+})}) - \delta_{+\,(1-\alpha)}^{*}\,; \hat f(\mathbf{x}\mathbf{(s_{+}})) - \delta_{+\,(\alpha)}^{*}]

where \delta_{+\,(\alpha)}^{*} and \delta_{+\,(1-\alpha)}^{*} are the \alpha- and (1-\alpha)-quantiles of \delta_{+}^{*}, pooled over all 1000 bootstrap repetitions.

Predictive distributions for binary and ordinal responses are directly obtained from a final geoGAM fit by predicting probabilities of occurrence \mathrm{\widetilde{Prob}}(Y(\mathbf{s})=r\,|\,\mathbf{x}(\mathbf{s)}) (Davison and Hinkley 1997, p. 358).

Value

Data frame of nrows(newdata) rows and R + 2 columns with x and y indicating coordinates of the location and P1 to P...R the prediction at this location from 1...R replications.

Author(s)

M. Nussbaum

References

Nussbaum, M., Walthert, L., Fraefel, M., Greiner, L., and Papritz, A.: Mapping of soil properties at high resolution in Switzerland using boosted geoadditive models, SOIL, 3, 191-210, doi:10.5194/soil-3-191-2017, 2017.

Davison, A. C. and Hinkley, D. V., 2008. Bootstrap Methods and Their Applications. Cambridge University Press.

See Also

To create geoGAM objects see geoGAM and to predict without simulation of the predictive distribution see predict.geoGAM.

Examples



data(quakes)

# group stations to ensure min 20 observations per factor level
# and reduce number of levels for speed
quakes$stations <- factor( cut( quakes$stations, breaks = c(0,15,19,23,30,39,132)) )

# Artificially split data to create prediction data set
set.seed(1)
quakes.pred <- quakes[ ss <- sample(1:nrow(quakes), 500), ]
quakes <- quakes[ -ss, ]

quakes.geogam <- geoGAM(response = "mag",
                        covariates = c("stations", "depth"),
                        coords = c("lat", "long"),
                        data = quakes,
                        max.stop = 20,
                        cores = 1)


## compute model based bootstrap with 10 repetitions (use at least 100)
quakes.boot <- bootstrap(quakes.geogam,
                         newdata = quakes.pred,
                         R = 10, cores = 1)


# plot predictive distribution for site in row 9
hist( as.numeric( quakes.boot[ 9, -c(1:2)] ), col = "grey",
      main = paste("Predictive distribution at", paste( quakes.boot[9, 1:2], collapse = "/" )),
      xlab = "predicted magnitude")

# compute 95 % prediction interval and add to plot
quant95 <- quantile( as.numeric( quakes.boot[ 9, -c(1:2)] ), probs = c(0.025, 0.975) )
abline(v = quant95[1], lty = "dashed")
abline(v = quant95[2], lty = "dashed")


[Package geoGAM version 0.1-3 Index]