predgc {gcKrig}R Documentation

Prediction at Unobserved Locations in Gaussian Copula Models for Geostatistical Count Data

Description

Computes the plug-in prediction at unobserved sites. Two methods are implemented. If method = 'GHK' then the maximum simulated likelihood estimates are computed and the sequential importance sampling method is used in the integral evaluation. If method = 'GQT' then the maximum surrogate likelihood estimates are computed and the generalized quantile transform method is used in integral approximation.

Usage

predgc(obs.y, obs.x = NULL, obs.locs, pred.x = NULL, pred.locs,
       longlat = FALSE, distscale = 1, marginal, corr, obs.effort = 1,
       pred.effort = 1, method = "GHK", estpar = NULL, corrpar0 = NULL,
       pred.interval = NULL, parallel = FALSE,
       ghkoptions = list(nrep = c(100,1000), reorder = FALSE, seed = 12345),
       paralleloptions = list(n.cores = 2, cluster.type = "SOCK"))

Arguments

obs.y

a non-negative integer vector of observed response with its length equals to the number of observed locations.

obs.x

a numeric matrix or data frame of covariates at observed locations, with its number of rows equals to the number of observed locations. If no covariates then obs.x = NULL.

obs.locs

a numeric matrix or data frame of observed locations.obs.effort The first column is x or longitude, the second column is y or latitude. The number of observed locations is equal to the number of rows.

pred.x

a numeric matrix or data frame of covariates at prediction locations, with its number of rows equals to the number of prediction locations. If no covariates then pred.x = NULL.

pred.locs

a numeric matrix or data frame of prediction locations. First column is x or longitude, second column is y or latitude. The number of prediction locations equals to the number of rows.

longlat

if FALSE, use Euclidean distance, if TRUE use great circle distance. The default is FALSE.

distscale

a numeric scaling factor for computing distance. If original distance is in kilometers, then distscale = 1000 will convert it to meters.

marginal

an object of class marginal.gc specifying the marginal distribution.

corr

an object of class corr.gc specifying the correlation function.

obs.effort

sampling effort at observed locations. For binomial marginal it is the size parameter (number of trials). See details.

pred.effort

sampling effort at prediction locations. For binomial marginal it is the size parameter (number of trials). See details.

method

two methods are implemented. If method = 'GHK' then the maximum simulated likelihood estimates are computed, if method = 'GQT' then the maximum surrogate likelihood estimates are computed.

estpar

if estpar = NULL then the likelihood estimates will be computed first, then plug-in into the predictive density. When all estimates are available, it is suggested to specify estpar with the parameter estimates so re-fitting is not needed. If so, the sequence of the input values for estpar is: regression parameters, overdispersion (if any), and correlation parameters (range and nugget, if applicable).

corrpar0

the starting value of correlation parameters in optimization procedure. If corrpar0 = NULL then initial range is set to be half of the median distance in distance matrix, and initial nugget (if nugget = TRUE) is 0.2.

pred.interval

a number between 0 and 1 representing confidence level of the prediction interval. The program will output two types of the prediction intervals, see detail. If pred.interval = NULL then no prediction interval will be computed.

parallel

if TRUE then parallel computing is used to predict multiple prediction locations simultaneously. If FALSE then a serial version will be called.

ghkoptions

a list of three elements that only need to be specified if method = 'GHK'.

nrep is the Monte Carlo size of the importance sampling algorithm for likelihood approximation. It can be a vector with increasing positive integers so that the model is fitted with a sequence of different Monte Carlo sizes, and the starting values for optimization are taken from the previous fitting. The default value is 100 for the first optimization and 1000 for the second and definitive optimization.

reorder indicates whether the integral will be reordered every iteration in computation according to the algorithm in Gibson, etal (1994), default is FALSE.

seed is seed of the pseudorandom generator used in Monte Carlo simulation.

paralleloptions

a list of two elements that only need to be specified if parallel = TRUE.

n.cores is the number of cores to be used in parallel prediction.

cluster.type is type of cluster to be used for parallel computing; can be "SOCK", "MPI", "PVM", or "NWS".

Details

This program implemented two methods in predicting the response at unobserved sites. See mlegc.

The argument obs.effort and pred.effort are the sampling effort (known). It can be used to consider heterogeneity of the measurement time or area at different locations. The default is 1 for all locations. See Han and De Oliveira (2016) for more details.

The program computes two types of prediction intervals at a given confidence level. The shortest prediction interval is obtained from evaluating the highest to lowest prediction densities; the equal tail prediction interval has equal tail probabilities.

Value

A list of class "predgc" with the following elements:

obs.locs

observed locations.

obs.y

observed values at observed locations.

pred.locs

prediction locations.

predValue

the expectation of the conditional predictive distribution.

predCount

predicted counts; the closest integer that predValue rounded to.

predVar

estimated variance of the prediction at prediction locations.

ConfidenceLevel

confidence level (between 0 to 1) if prediction interval is computed.

predInterval.EqualTail

equal-tail prediction interval.

predInterval.Shortest

shortest length prediction interval.

Author(s)

Zifei Han hanzifei1@gmail.com

References

Han, Z. and De Oliveira, V. (2016) On the correlation structure of Gaussian copula models for geostatistical count data. Australian and New Zealand Journal of Statistics, 58:47-69.

Kazianka, H. and Pilz, J. (2010) Copula-based geostatistical modeling of continuous and discrete data including covariates. Stoch Environ Res Risk Assess 24:661-673.

Kazianka, H. (2013) Approximate copula-based estimation and prediction of discrete spatial data. Stoch Environ Res Risk Assess 27:2015-2026.

Masarotto, G. and Varin, C. (2012) Gaussian copula marginal regression. Electronic Journal of Statistics 6:1517-1549. https://projecteuclid.org/euclid.ejs/1346421603.

Masarotto, G. and Varin C. (2017). Gaussian Copula Regression in R. Journal of Statistical Software, 77(8), 1–26. doi: 10.18637/jss.v077.i08.

Han, Z. and De Oliveira, V. (2018) gcKrig: An R Package for the Analysis of Geostatistical Count Data Using Gaussian Copulas. Journal of Statistical Software, 87(13), 1–32. doi: 10.18637/jss.v087.i13.

See Also

gcmr; mlegc

Examples

## Not run: 
## For fast check predict at four locations only
data(Weed95)
weedobs <- Weed95[Weed95$dummy==1, ]
weedpred <- Weed95[Weed95$dummy==0, ]
predweed1 <- predgc(obs.y = weedobs$weedcount, obs.x = weedobs[,4:5], obs.locs = weedobs[,1:2],
                     pred.x = weedpred[1:4,4:5], pred.locs = weedpred[1:4,1:2],
                     marginal = negbin.gc(link = 'log'), pred.interval = 0.9,
                     corr = matern.gc(kappa = 0.5, nugget = TRUE), method = 'GHK')
#summary(predweed1)
#plot(predweed1)

## Time consuming examples
## Weed prediction at 200 locations using parallel programming
predweed2 <- predgc(obs.y = weedobs$weedcount, obs.x = weedobs[,4:5], obs.locs = weedobs[,1:2],
                     pred.x = weedpred[,4:5], pred.locs = weedpred[,1:2],
                     marginal = negbin.gc(link = 'log'),
                     corr = matern.gc(kappa = 0.5, nugget = TRUE), method = 'GHK',
                     pred.interval = 0.95, parallel = TRUE,
                     paralleloptions = list(n.cores = 4))
#summary(predweed2)
#plot(predweed2)


## A more time consuming example for generating a prediction map at a fine grid
data(OilWell)
gridstep <- seq(0.5, 30.5, length = 40)
locOilpred <- data.frame(Easting = expand.grid(gridstep, gridstep)[,1],
                        Northing = expand.grid(gridstep, gridstep)[,2])
PredOil <- predgc(obs.y = OilWell[,3], obs.locs = OilWell[,1:2],  pred.locs = locOilpred,
                       marginal = binomial.gc(link = 'logit'),
                       corr = matern.gc(nugget = FALSE), obs.effort = 1,
                       pred.effort = 1, method = 'GHK',
                       parallel = TRUE, paralleloptions = list(n.cores = 4))
PredMat <- summary(PredOil)

## To generate better prediction maps
library(colorspace)
filled.contour(seq(0.5,30.5,length=40), seq(0.5,30.5,length=40),
               matrix(PredMat$predMean,40,), zlim = c(0, 1), col=rev(heat_hcl(12)),
               nlevels=12, xlab = "Eastings", ylab = "Northings",
               plot.axes = {axis(1); axis(2); points(OilWell[,1:2], col = 1,
               cex = 0.25 + 0.25*OilWell[,3])})

filled.contour(seq(0.5,30.5,length=40), seq(0.5,30.5,length=40),
               matrix(PredMat$predVar,40,),
               zlim = c(0, 0.3), col = rev(heat_hcl(12)), nlevels = 10,
               xlab = "Eastings", ylab = "Northings",
               plot.axes = {axis(1); axis(2); points(OilWell[,1:2], col = 1,
               cex = 0.25 + 0.25*OilWell[,3])})


## End(Not run)


[Package gcKrig version 1.1.8 Index]