predict.geoGAM {geoGAM} | R Documentation |
Prediction from fitted geoGAM model
Description
Takes a fitted geoGAM
object and produces point predictions for a new set of covariate values. If no new data is provided fitted values are returned. Centering and scaling is applied with the same parameters as for the calibration data
set given to geoGAM
. Factor levels are aggregated according to the final model fit.
Usage
## S3 method for class 'geoGAM'
predict(object, newdata,
type = c("response", "link", "probs", "class"),
back.transform = c("none", "log", "sqrt"),
threshold = 0.5, se.fit = FALSE, ...)
Arguments
object |
an object of class |
newdata |
An optional data frame in which to look for variables with which to predict. If omitted, the fitted values are used. If newdata is provided then it should contain all the variables needed for prediction: a warning is generated if not. Factors aggregated by the function |
type |
Type of prediction. |
back.transform |
Should to |
threshold |
Ignored for |
se.fit |
logical. Default is FALSE. |
... |
further arguments to |
Details
Returns point predictions for new locations s
from linear and smooth trends \hat f(\mathbf{x},s)
estimated
by penalized least squares geoGAM by calling the function predict.gam
.
Back transformation of log and sqrt
For lognormal responses (back.transform = 'log'
) in full analogy to lognormal kriging (Cressie-2006, Eq. 20) the predictions are backtransformed by
\mathrm{E}[ Y(\mathbf{s})\,|\,\mathbf{x}] = \exp\left(~ \hat
f(\mathbf{x}(\mathbf{s})) + \frac{1}{2} \hat \sigma^2 -
\frac{1}{2} \mbox{Var}[ \hat
f(\mathbf{x}(\mathbf{s}) ] \right)
with \hat f(\mathbf{x}(\mathbf{s}))
being the prediction of the log-transformed response,
\hat \sigma^2
the estimated residual variance of the final geoGAM
fit (see predict.gam
with se.fit=TRUE
) and
\mbox{Var}[ \hat f(\mathbf{x}(\mathbf{s}) ) ]
the variance of \hat f(\mathbf{x}(\mathbf{s}))
as provided again by the final geoGAM
.
For responses with square root transformation (back.transform = 'sqrt'
) unbiased backtransform is computed by (Nussbaum et al. 2017b)
\tilde{Y}(s) = \hat{f}(\mathbf{x}(\mathbf{s}))^2 + \hat{\sigma}^2 - Var[ \hat{f}(\mathbf{x}(\mathbf{s}))]
with \hat{f}(\mathbf{x}(\mathbf{s}))^2
being the prediction of the sqrt-transformed response, \hat{\sigma}^2
the estimated residual variance of the fitted model and Var[ \hat{f}(\mathbf{x}(\mathbf{s}))]
the variance of \hat{f}(\mathbf{x}(\mathbf{s}))
as provided again by geoGAM
.
Discretization of probability predictions
For binary and ordered responses predictions yield
predicted occurrence probabilities \tilde
P(Y(\mathbf{s})=\mathbf{r}|\mathbf{x},s)
for response classes \mathbf{r}
.
To obtain binary class predictions a threshold
can be given. A threshold of 0.5 (default) maximizes percentage correct of predicted classes. For binary responses of rare events this threshold may not be optimal. Maximizing on e.g. Gilbert Skill Score (GSS, Wilks, 2011, chap. 8) on cross-validation predictions of the final geoGAM might be a better strategy. GSS is excluding the correct predictions of the more abundant class and is preferably used in case of unequal distribution of binary responses
(direct implementation of such a cross validation procedure planed.)
For ordered responses predict
with type = 'class'
selects the class to which the median of the
probability distribution over the ordered categories is assigned (Tutz 2012, p. 475).
Value
Vector of point predictions for the sites in newdata
is returned, with unbiased back transformation applied according to option back.transform
.
If se.fit = TRUE
then a 2 item list is returned with items fit
and se.fit
containing predictions and associated standard error estimates as computed by predict.gam
.
Author(s)
M. Nussbaum
References
Cressie, N. A. C., 1993. Statistics for Spatial Data, John Wiley and Sons.
Nussbaum, M., Walthert, L., Fraefel, M., Greiner, L., and Papritz, A.: Mapping of soil properties at high resolution in Switzerland using boosted geoadditive models, SOIL, 3, 191-210, doi:10.5194/soil-3-191-2017, 2017.
Nussbaum, M., Spiess, K., Baltensweiler, A., Grob, U., Keller, A., Greiner, L., Schaepman, M. E., and Papritz, A.: Evaluation of digital soil mapping approaches with large sets of environmental covariates, SOIL, 4, 1-22, doi:10.5194/soil-4-1-2018, 2018.
Tutz, G., 2012. Regression for Categorical Data, Cambridge University Press.
Wilks, D. S., 2011. Statistical Methods in the Atmospheric Sciences, Academic Press.
See Also
geoGAM
, gam
, predict.gam
, summary.geoGAM
, plot.geoGAM
Examples
data(quakes)
set.seed(2)
quakes <- quakes[ ss <- sample(1:nrow(quakes), 50), ]
# Artificially split data to create prediction data set
quakes.pred <- quakes[ -ss, ]
quakes.geogam <- geoGAM(response = "mag",
covariates = c("depth", "stations"),
data = quakes,
max.stop = 5,
cores = 1)
predicted <- predict(quakes.geogam, newdata = quakes.pred, type = "response" )
## Use soil data set of soil mapping study area near Berne
data(berne)
data(berne.grid)
# Split data sets and
# remove rows with missing values in response and covariates
d.cal <- berne[ berne$dataset == "calibration" & complete.cases(berne), ]
### Model selection for numeric response
ph10.geogam <- geoGAM(response = "ph.0.10",
covariates = names(d.cal)[14:ncol(d.cal)],
coords = c("x", "y"),
data = d.cal,
seed = 1,
cores = 1)
# Create GRID output with predictions
sp.grid <- berne.grid[, c("x", "y")]
sp.grid$pred.ph.0.10 <- predict(ph10.geogam, newdata = berne.grid)
if(requireNamespace("raster")){
require("sp")
# transform to sp object
coordinates(sp.grid) <- ~ x + y
# assign Swiss CH1903 / LV03 projection
proj4string(sp.grid) <- CRS("EPSG:21781")
# transform to grid
gridded(sp.grid) <- TRUE
plot(sp.grid)
# optionally save result to GeoTiff
# writeRaster(raster(sp.grid, layer = "pred.ph.0.10"),
# filename= "raspH10.tif", datatype = "FLT4S", format ="GTiff")
}