R: Prediction at Unobserved Locations in Gaussian Copula Models...

predgc {gcKrig}

R Documentation

Prediction at Unobserved Locations in Gaussian Copula Models for Geostatistical Count Data

Description

Computes the plug-in prediction at unobserved sites. Two methods are implemented. If method = 'GHK' then the maximum simulated likelihood estimates are computed and the sequential importance sampling method is used in the integral evaluation. If method = 'GQT' then the maximum surrogate likelihood estimates are computed and the generalized quantile transform method is used in integral approximation.

Usage

predgc(obs.y, obs.x = NULL, obs.locs, pred.x = NULL, pred.locs,
       longlat = FALSE, distscale = 1, marginal, corr, obs.effort = 1,
       pred.effort = 1, method = "GHK", estpar = NULL, corrpar0 = NULL,
       pred.interval = NULL, parallel = FALSE,
       ghkoptions = list(nrep = c(100,1000), reorder = FALSE, seed = 12345),
       paralleloptions = list(n.cores = 2, cluster.type = "SOCK"))

Arguments

`obs.y`	a non-negative integer vector of observed response with its length equals to the number of observed locations.
`obs.x`	a numeric matrix or data frame of covariates at observed locations, with its number of rows equals to the number of observed locations. If no covariates then `obs.x = NULL`.
`obs.locs`	a numeric matrix or data frame of observed locations.obs.effort The first column is x or longitude, the second column is y or latitude. The number of observed locations is equal to the number of rows.
`pred.x`	a numeric matrix or data frame of covariates at prediction locations, with its number of rows equals to the number of prediction locations. If no covariates then `pred.x = NULL`.
`pred.locs`	a numeric matrix or data frame of prediction locations. First column is x or longitude, second column is y or latitude. The number of prediction locations equals to the number of rows.
`longlat`	if FALSE, use Euclidean distance, if TRUE use great circle distance. The default is FALSE.
`distscale`	a numeric scaling factor for computing distance. If original distance is in kilometers, then `distscale = 1000` will convert it to meters.
`marginal`	an object of class `marginal.gc` specifying the marginal distribution.
`corr`	an object of class `corr.gc` specifying the correlation function.
`obs.effort`	sampling effort at observed locations. For binomial marginal it is the size parameter (number of trials). See details.
`pred.effort`	sampling effort at prediction locations. For binomial marginal it is the size parameter (number of trials). See details.
`method`	two methods are implemented. If `method = 'GHK'` then the maximum simulated likelihood estimates are computed, if `method = 'GQT'` then the maximum surrogate likelihood estimates are computed.
`estpar`	if `estpar = NULL` then the likelihood estimates will be computed first, then plug-in into the predictive density. When all estimates are available, it is suggested to specify `estpar` with the parameter estimates so re-fitting is not needed. If so, the sequence of the input values for `estpar` is: regression parameters, overdispersion (if any), and correlation parameters (range and nugget, if applicable).
`corrpar0`	the starting value of correlation parameters in optimization procedure. If `corrpar0 = NULL` then initial range is set to be half of the median distance in distance matrix, and initial nugget (if `nugget = TRUE`) is 0.2.
`pred.interval`	a number between `0` and `1` representing confidence level of the prediction interval. The program will output two types of the prediction intervals, see detail. If `pred.interval = NULL` then no prediction interval will be computed.
`parallel`	if `TRUE` then parallel computing is used to predict multiple prediction locations simultaneously. If `FALSE` then a serial version will be called.
`ghkoptions`	a list of three elements that only need to be specified if `method = 'GHK'`. `nrep` is the Monte Carlo size of the importance sampling algorithm for likelihood approximation. It can be a vector with increasing positive integers so that the model is fitted with a sequence of different Monte Carlo sizes, and the starting values for optimization are taken from the previous fitting. The default value is 100 for the first optimization and 1000 for the second and definitive optimization. `reorder` indicates whether the integral will be reordered every iteration in computation according to the algorithm in Gibson, etal (1994), default is FALSE. `seed` is seed of the pseudorandom generator used in Monte Carlo simulation.
`paralleloptions`	a list of two elements that only need to be specified if `parallel = TRUE`. `n.cores` is the number of cores to be used in parallel prediction. `cluster.type` is type of cluster to be used for parallel computing; can be "SOCK", "MPI", "PVM", or "NWS".

Details

This program implemented two methods in predicting the response at unobserved sites. See mlegc.

The argument obs.effort and pred.effort are the sampling effort (known). It can be used to consider heterogeneity of the measurement time or area at different locations. The default is 1 for all locations. See Han and De Oliveira (2016) for more details.

The program computes two types of prediction intervals at a given confidence level. The shortest prediction interval is obtained from evaluating the highest to lowest prediction densities; the equal tail prediction interval has equal tail probabilities.

Value

A list of class "predgc" with the following elements:

`obs.locs`	observed locations.
`obs.y`	observed values at observed locations.
`pred.locs`	prediction locations.
`predValue`	the expectation of the conditional predictive distribution.
`predCount`	predicted counts; the closest integer that `predValue` rounded to.
`predVar`	estimated variance of the prediction at prediction locations.
`ConfidenceLevel`	confidence level (between 0 to 1) if prediction interval is computed.
`predInterval.EqualTail`	equal-tail prediction interval.
`predInterval.Shortest`	shortest length prediction interval.

Author(s)

Zifei Han hanzifei1@gmail.com

References

Han, Z. and De Oliveira, V. (2016) On the correlation structure of Gaussian copula models for geostatistical count data. Australian and New Zealand Journal of Statistics, 58:47-69.

Kazianka, H. and Pilz, J. (2010) Copula-based geostatistical modeling of continuous and discrete data including covariates. Stoch Environ Res Risk Assess 24:661-673.

Kazianka, H. (2013) Approximate copula-based estimation and prediction of discrete spatial data. Stoch Environ Res Risk Assess 27:2015-2026.

Masarotto, G. and Varin, C. (2012) Gaussian copula marginal regression. Electronic Journal of Statistics 6:1517-1549. https://projecteuclid.org/euclid.ejs/1346421603.

Masarotto, G. and Varin C. (2017). Gaussian Copula Regression in R. Journal of Statistical Software, 77(8), 1–26. doi: 10.18637/jss.v077.i08.

Han, Z. and De Oliveira, V. (2018) gcKrig: An R Package for the Analysis of Geostatistical Count Data Using Gaussian Copulas. Journal of Statistical Software, 87(13), 1–32. doi: 10.18637/jss.v087.i13.

Examples

## Not run: 
## For fast check predict at four locations only
data(Weed95)
weedobs <- Weed95[Weed95$dummy==1, ]
weedpred <- Weed95[Weed95$dummy==0, ]
predweed1 <- predgc(obs.y = weedobs$weedcount, obs.x = weedobs[,4:5], obs.locs = weedobs[,1:2],
                     pred.x = weedpred[1:4,4:5], pred.locs = weedpred[1:4,1:2],
                     marginal = negbin.gc(link = 'log'), pred.interval = 0.9,
                     corr = matern.gc(kappa = 0.5, nugget = TRUE), method = 'GHK')
#summary(predweed1)
#plot(predweed1)

## Time consuming examples
## Weed prediction at 200 locations using parallel programming
predweed2 <- predgc(obs.y = weedobs$weedcount, obs.x = weedobs[,4:5], obs.locs = weedobs[,1:2],
                     pred.x = weedpred[,4:5], pred.locs = weedpred[,1:2],
                     marginal = negbin.gc(link = 'log'),
                     corr = matern.gc(kappa = 0.5, nugget = TRUE), method = 'GHK',
                     pred.interval = 0.95, parallel = TRUE,
                     paralleloptions = list(n.cores = 4))
#summary(predweed2)
#plot(predweed2)


## A more time consuming example for generating a prediction map at a fine grid
data(OilWell)
gridstep <- seq(0.5, 30.5, length = 40)
locOilpred <- data.frame(Easting = expand.grid(gridstep, gridstep)[,1],
                        Northing = expand.grid(gridstep, gridstep)[,2])
PredOil <- predgc(obs.y = OilWell[,3], obs.locs = OilWell[,1:2],  pred.locs = locOilpred,
                       marginal = binomial.gc(link = 'logit'),
                       corr = matern.gc(nugget = FALSE), obs.effort = 1,
                       pred.effort = 1, method = 'GHK',
                       parallel = TRUE, paralleloptions = list(n.cores = 4))
PredMat <- summary(PredOil)

## To generate better prediction maps
library(colorspace)
filled.contour(seq(0.5,30.5,length=40), seq(0.5,30.5,length=40),
               matrix(PredMat$predMean,40,), zlim = c(0, 1), col=rev(heat_hcl(12)),
               nlevels=12, xlab = "Eastings", ylab = "Northings",
               plot.axes = {axis(1); axis(2); points(OilWell[,1:2], col = 1,
               cex = 0.25 + 0.25*OilWell[,3])})

filled.contour(seq(0.5,30.5,length=40), seq(0.5,30.5,length=40),
               matrix(PredMat$predVar,40,),
               zlim = c(0, 0.3), col = rev(heat_hcl(12)), nlevels = 10,
               xlab = "Eastings", ylab = "Northings",
               plot.axes = {axis(1); axis(2); points(OilWell[,1:2], col = 1,
               cex = 0.25 + 0.25*OilWell[,3])})


## End(Not run)

[Package gcKrig version 1.1.8 Index]