R: Prediction from fitted geoGAM model

predict.geoGAM {geoGAM}

R Documentation

Prediction from fitted geoGAM model

Description

Takes a fitted geoGAM object and produces point predictions for a new set of covariate values. If no new data is provided fitted values are returned. Centering and scaling is applied with the same parameters as for the calibration data set given to geoGAM. Factor levels are aggregated according to the final model fit.

Usage

## S3 method for class 'geoGAM'
predict(object, newdata,
        type = c("response", "link", "probs", "class"),
        back.transform = c("none", "log", "sqrt"),
        threshold = 0.5, se.fit = FALSE, ...)

Arguments

`object`	an object of class `geoGAM`
`newdata`	An optional data frame in which to look for variables with which to predict. If omitted, the fitted values are used. If newdata is provided then it should contain all the variables needed for prediction: a warning is generated if not. Factors aggregated by the function `geoGAM` will be aggregated in the same way for prediction within this function.
`type`	Type of prediction.
`back.transform`	Should to `log` or `sqrt` transformed responses unbiased back transformation be applied? Default is `none`. Ignored for categorical responses.
`threshold`	Ignored for `type = c("response", "link", "probs")` and for `type = "class"` for responses with more than two levels.
`se.fit`	logical. Default is FALSE.
`...`	further arguments to `predict()`.

Details

Returns point predictions for new locations s from linear and smooth trends \hat f(\mathbf{x},s) estimated by penalized least squares geoGAM by calling the function predict.gam.

Back transformation of log and sqrt

For lognormal responses (back.transform = 'log') in full analogy to lognormal kriging (Cressie-2006, Eq. 20) the predictions are backtransformed by

\mathrm{E}[ Y(\mathbf{s})\,|\,\mathbf{x}] = \exp\left(~ \hat f(\mathbf{x}(\mathbf{s})) + \frac{1}{2} \hat \sigma^2 - \frac{1}{2} \mbox{Var}[ \hat f(\mathbf{x}(\mathbf{s}) ] \right)

with \hat f(\mathbf{x}(\mathbf{s})) being the prediction of the log-transformed response, \hat \sigma^2 the estimated residual variance of the final geoGAM fit (see predict.gam with se.fit=TRUE) and \mbox{Var}[ \hat f(\mathbf{x}(\mathbf{s}) ) ] the variance of \hat f(\mathbf{x}(\mathbf{s})) as provided again by the final geoGAM.

For responses with square root transformation (back.transform = 'sqrt') unbiased backtransform is computed by (Nussbaum et al. 2017b)

\tilde{Y}(s) = \hat{f}(\mathbf{x}(\mathbf{s}))^2 + \hat{\sigma}^2 - Var[ \hat{f}(\mathbf{x}(\mathbf{s}))]

with \hat{f}(\mathbf{x}(\mathbf{s}))^2 being the prediction of the sqrt-transformed response, \hat{\sigma}^2 the estimated residual variance of the fitted model and Var[ \hat{f}(\mathbf{x}(\mathbf{s}))] the variance of \hat{f}(\mathbf{x}(\mathbf{s})) as provided again by geoGAM.

Discretization of probability predictions

For binary and ordered responses predictions yield predicted occurrence probabilities \tilde P(Y(\mathbf{s})=\mathbf{r}|\mathbf{x},s) for response classes \mathbf{r}.

To obtain binary class predictions a threshold can be given. A threshold of 0.5 (default) maximizes percentage correct of predicted classes. For binary responses of rare events this threshold may not be optimal. Maximizing on e.g. Gilbert Skill Score (GSS, Wilks, 2011, chap. 8) on cross-validation predictions of the final geoGAM might be a better strategy. GSS is excluding the correct predictions of the more abundant class and is preferably used in case of unequal distribution of binary responses (direct implementation of such a cross validation procedure planed.)

For ordered responses predict with type = 'class' selects the class to which the median of the probability distribution over the ordered categories is assigned (Tutz 2012, p. 475).

Value

Vector of point predictions for the sites in newdata is returned, with unbiased back transformation applied according to option back.transform.

If se.fit = TRUE then a 2 item list is returned with items fit and se.fit containing predictions and associated standard error estimates as computed by predict.gam.

Author(s)

M. Nussbaum

References

Cressie, N. A. C., 1993. Statistics for Spatial Data, John Wiley and Sons.

Nussbaum, M., Walthert, L., Fraefel, M., Greiner, L., and Papritz, A.: Mapping of soil properties at high resolution in Switzerland using boosted geoadditive models, SOIL, 3, 191-210, doi:10.5194/soil-3-191-2017, 2017.

Nussbaum, M., Spiess, K., Baltensweiler, A., Grob, U., Keller, A., Greiner, L., Schaepman, M. E., and Papritz, A.: Evaluation of digital soil mapping approaches with large sets of environmental covariates, SOIL, 4, 1-22, doi:10.5194/soil-4-1-2018, 2018.

Tutz, G., 2012. Regression for Categorical Data, Cambridge University Press.

Wilks, D. S., 2011. Statistical Methods in the Atmospheric Sciences, Academic Press.

Examples


data(quakes)
set.seed(2)

quakes <- quakes[ ss <- sample(1:nrow(quakes), 50), ]

# Artificially split data to create prediction data set
quakes.pred <- quakes[ -ss, ]

quakes.geogam <- geoGAM(response = "mag",
                        covariates = c("depth", "stations"),
                        data = quakes,
                        max.stop = 5,
                        cores = 1)

predicted <- predict(quakes.geogam, newdata = quakes.pred, type = "response" )





## Use soil data set of soil mapping study area near Berne

data(berne)
data(berne.grid)

# Split data sets and
# remove rows with missing values in response and covariates

d.cal <- berne[ berne$dataset == "calibration" & complete.cases(berne), ]

### Model selection for numeric response
ph10.geogam <- geoGAM(response = "ph.0.10",
                      covariates = names(d.cal)[14:ncol(d.cal)],
                      coords = c("x", "y"),
                      data = d.cal,
                      seed = 1,
                      cores = 1)

# Create GRID output with predictions
sp.grid <- berne.grid[, c("x", "y")]

sp.grid$pred.ph.0.10 <- predict(ph10.geogam, newdata = berne.grid)

if(requireNamespace("raster")){

  require("sp")

  # transform to sp object
  coordinates(sp.grid) <- ~ x + y

  # assign Swiss CH1903 / LV03 projection
  proj4string(sp.grid) <- CRS("EPSG:21781")

  # transform to grid
  gridded(sp.grid) <- TRUE

  plot(sp.grid)

  # optionally save result to GeoTiff
  # writeRaster(raster(sp.grid, layer = "pred.ph.0.10"),
  #             filename= "raspH10.tif", datatype = "FLT4S", format ="GTiff")

}

[Package geoGAM version 0.1-3 Index]