bootstrap.geoGAM {geoGAM} | R Documentation |
Bootstrapped predictive distribution
Description
Method for class geoGAM
to compute model based bootstrap for point predictions. Returns complete predictive distribution of which prediction intervals can be computed.
Usage
## Default S3 method:
bootstrap(object, ...)
## S3 method for class 'geoGAM'
bootstrap(object, newdata, R = 100,
back.transform = c("none", "log", "sqrt"),
seed = NULL, cores = detectCores(), ...)
Arguments
object |
geoGAM object |
newdata |
data frame in which to look for covariates with which to predict. |
R |
number of bootstrap replicates, single positive integer. |
back.transform |
sould to |
seed |
seed for simulation of new response. Set seed for reproducible results. |
cores |
number of cores to be used for parallel computing. |
... |
further arguments. |
Details
Soil properties are predicted for new locations \mathbf{s_{+}}
from the final geoGAM
fit by \tilde{Y}(\mathbf{s_+})=\hat f(\mathbf{x}\mathbf{(s_+)})
, see function predict.geoGAM
.
To model the predictive distributions for continuous responses bootstrap.geoGAM
uses a
non-parametric, model-based bootstrapping approach (Davison and Hinkley 1997, pp. 262, 285) as follows:
New values of the response are simulated according to
Y(\mathbf{s})^{*} = \hat f(\mathbf{x}(\mathbf{s}))+\mathbf{\epsilon}
, where\hat f(\mathbf{x}(\mathbf{s}))
are the fitted values of the final model and\epsilon
are errors randomly sampled with replacement from the centred, homoscedastic residuals of the final model Wood 2006, p. 129).-
geoGAM
is fitted toY(\mathbf{s})^{*}
. Prediction errors are computed according to
\delta_{+}^{*} = \hat f(\mathbf{x}(\mathbf{s_{+}}))^{*} - (\, \hat f(\mathbf{x}(\mathbf{s_{+}})) + \mathbf{\epsilon} \,)
, where\hat f(\mathbf{x}(\mathbf{s_{+}}))^{*}
are predicted values at new locations\mathbf{s_{+}}
of the model built with the simulated responseY(\mathbf{s})^{*}
in step B above, and the errors\epsilon
are again randomly sampled from the centred, homoscedastic residuals of the final model (see step A).
Prediction intervals are computed according to
[\hat f(\mathbf{x}(\mathbf{s_{+})}) - \delta_{+\,(1-\alpha)}^{*}\,; \hat f(\mathbf{x}\mathbf{(s_{+}})) - \delta_{+\,(\alpha)}^{*}]
where \delta_{+\,(\alpha)}^{*}
and \delta_{+\,(1-\alpha)}^{*}
are the \alpha
- and (1-\alpha)
-quantiles of \delta_{+}^{*}
, pooled over all 1000 bootstrap repetitions.
Predictive distributions for binary and ordinal responses are
directly obtained from a final geoGAM
fit by predicting probabilities
of occurrence \mathrm{\widetilde{Prob}}(Y(\mathbf{s})=r\,|\,\mathbf{x}(\mathbf{s)})
(Davison and Hinkley 1997, p. 358).
Value
Data frame of nrows(newdata)
rows and R + 2
columns with x
and y
indicating coordinates of the location and P1
to P...R
the prediction at this location from 1...R
replications.
Author(s)
M. Nussbaum
References
Nussbaum, M., Walthert, L., Fraefel, M., Greiner, L., and Papritz, A.: Mapping of soil properties at high resolution in Switzerland using boosted geoadditive models, SOIL, 3, 191-210, doi:10.5194/soil-3-191-2017, 2017.
Davison, A. C. and Hinkley, D. V., 2008. Bootstrap Methods and Their Applications. Cambridge University Press.
See Also
To create geoGAM objects see geoGAM
and to predict without simulation of the predictive distribution see predict.geoGAM
.
Examples
data(quakes)
# group stations to ensure min 20 observations per factor level
# and reduce number of levels for speed
quakes$stations <- factor( cut( quakes$stations, breaks = c(0,15,19,23,30,39,132)) )
# Artificially split data to create prediction data set
set.seed(1)
quakes.pred <- quakes[ ss <- sample(1:nrow(quakes), 500), ]
quakes <- quakes[ -ss, ]
quakes.geogam <- geoGAM(response = "mag",
covariates = c("stations", "depth"),
coords = c("lat", "long"),
data = quakes,
max.stop = 20,
cores = 1)
## compute model based bootstrap with 10 repetitions (use at least 100)
quakes.boot <- bootstrap(quakes.geogam,
newdata = quakes.pred,
R = 10, cores = 1)
# plot predictive distribution for site in row 9
hist( as.numeric( quakes.boot[ 9, -c(1:2)] ), col = "grey",
main = paste("Predictive distribution at", paste( quakes.boot[9, 1:2], collapse = "/" )),
xlab = "predicted magnitude")
# compute 95 % prediction interval and add to plot
quant95 <- quantile( as.numeric( quakes.boot[ 9, -c(1:2)] ), probs = c(0.025, 0.975) )
abline(v = quant95[1], lty = "dashed")
abline(v = quant95[2], lty = "dashed")