mlegc {gcKrig} | R Documentation |
Maximum Likelihood Estimation in Gaussian Copula Models for Geostatistical Count Data
Description
Computes the maximum likelihood estimates. Two methods are implemented. If
method = 'GHK'
then the maximum simulated likelihood estimates are computed, if
method = 'GQT'
then the maximum surrogate likelihood estimates are computed.
Usage
mlegc(y, x = NULL, locs, marginal, corr, effort = 1, longlat = FALSE,
distscale = 1, method = "GHK", corrpar0 = NULL, ghkoptions = list(nrep
= c(100, 1000), reorder = FALSE, seed = 12345))
Arguments
y |
a non-negative integer vector of response with its length equals to the number of sampling locations. |
x |
a numeric matrix or data frame of covariates,
with its number of rows equals to the number of sampling locations.
If no covariates then |
locs |
a numeric matrix or data frame of n-D points with row denoting points. The first column is x or longitude, the second column is y or latitude. The number of locations is equal to the number of rows. |
marginal |
an object of class |
corr |
an object of class |
effort |
the sampling effort. For binomial marginal it is the size parameter (number of trials). See details. |
longlat |
if FALSE, use Euclidean distance, if TRUE use great circle distance. The default is FALSE. |
distscale |
a numeric scaling factor for computing distance. If original distance is in kilometers, then
|
method |
two methods are implemented. If
|
corrpar0 |
the starting value of correlation parameter in the optimization procedure.
If |
ghkoptions |
a list of three elements that only need to be specified if
|
Details
This program implemented one simulated likelihood method via sequential importance
sampling (see Masarotto and Varin 2012), which is same as the method implemented in package
gcmr
(Masarotto and Varin 2016) except an antithetic variable is used. It also implemented
one surrogate likelihood method via distributional transform (see Kazianka and Pilz 2010), which is
generally faster.
The argument effort
is the sampling effort (known). It can be used to consider the heterogeneity
of the measurement time or area at different locations.
The default is 1 for all locations. See Han and De Oliveira (2016) for more details.
Value
A list of class "mlegc" with the following elements:
MLE |
the maximum likelihood estimate. |
x |
the design matrix. |
nug |
1 if |
nreg |
number of regression parameters. |
log.lik |
the value of the maximum log-likelihood. |
AIC |
the Akaike information criterion. |
AICc |
the AICc information criterion; essentially AIC with a greater penalty for extra parameters. |
BIC |
the Bayesian information criterion. |
kmarg |
number of marginal parameters. |
par.df |
number of parameters. |
N |
number of observations. |
D |
the distance matrix. |
optlb |
lower bound in optimization. |
optub |
upper bound in optimization. |
hessian |
the hessian matrix evaluated at the final estimates. |
args |
arguments passed in function evaluation. |
Author(s)
Zifei Han hanzifei1@gmail.com
References
Han, Z. and De Oliveira, V. (2016) On the correlation structure of Gaussian copula models for geostatistical count data. Australian and New Zealand Journal of Statistics, 58:47-69.
Kazianka, H. and Pilz, J. (2010) Copula-based geostatistical modeling of continuous and discrete data including covariates. Stoch Environ Res Risk Assess 24:661-673.
Masarotto, G. and Varin, C. (2012) Gaussian copula marginal regression. Electronic Journal of Statistics 6:1517-1549. https://projecteuclid.org/euclid.ejs/1346421603.
Masarotto, G. and Varin C. (2017). Gaussian Copula Regression in R. Journal of Statistical Software, 77(8), 1–26. doi: 10.18637/jss.v077.i08.
Han, Z. and De Oliveira, V. (2018) gcKrig: An R Package for the Analysis of Geostatistical Count Data Using Gaussian Copulas. Journal of Statistical Software, 87(13), 1–32. doi: 10.18637/jss.v087.i13.
See Also
Examples
## Not run:
## Fit a Simulated Dataset with 100 locations
grid <- seq(0.05, 0.95, by = 0.1)
xloc <- expand.grid(x = grid, y = grid)[,1]
yloc <- expand.grid(x = grid, y = grid)[,2]
set.seed(123)
simData1 <- simgc(locs = cbind(xloc,yloc), sim.n = 1,
marginal = negbin.gc(mu = exp(1+xloc), od = 1),
corr = matern.gc(range = 0.4, kappa = 0.5, nugget = 0))
simFit1 <- mlegc(y = simData1$data, x = xloc, locs = cbind(xloc,yloc),
marginal = negbin.gc(link = 'log'),
corr = matern.gc(kappa = 0.5, nugget = FALSE), method = 'GHK')
simFit2 <- mlegc(y = simData1$data, x = xloc, locs = cbind(xloc,yloc),
marginal = negbin.gc(link = 'log'),
corr = matern.gc(kappa = 0.5, nugget = FALSE), method = 'GQT')
#summary(simFit1);summary(simFit2)
#plot(simFit1);plot(simFit2)
## Time consuming examples
## Fit a real dataset with 70 sampling locations.
data(Weed95)
weedobs <- Weed95[Weed95$dummy==1, ]
weedpred <- Weed95[Weed95$dummy==0, ]
Weedfit1 <- mlegc(y = weedobs$weedcount, x = weedobs[,4:5], locs = weedobs[,1:2],
marginal = poisson.gc(link='log'),
corr = matern.gc(kappa = 0.5, nugget = TRUE),
method = 'GHK')
summary(Weedfit1)
plot(Weedfit1)
## Fit a real dataset with 256 locations
data(LansingTrees)
Treefit1 <- mlegc(y = LansingTrees[,3], x = LansingTrees[,4], locs = LansingTrees[,1:2],
marginal = negbin.gc(link = 'log'),
corr = matern.gc(kappa = 0.5, nugget = FALSE), method = 'GHK')
summary(Treefit1)
plot(Treefit1)
# Try to use GQT method
Treefit2<- mlegc(y = LansingTrees[,3], x = LansingTrees[,4],
locs = LansingTrees[,1:2], marginal = poisson.gc(link='log'),
corr = matern.gc(kappa = 0.5, nugget = TRUE), method = 'GQT')
summary(Treefit2)
plot(Treefit2)
## Fit a real dataset with randomized locations
data(AtlanticFish)
Fitfish <- mlegc(y = AtlanticFish[,3], x = AtlanticFish[,4:6], locs = AtlanticFish[,1:2],
longlat = TRUE, marginal = negbin.gc(link='log'),
corr = matern.gc(kappa = 0.5, nugget = TRUE), method = 'GHK')
summary(Fitfish)
## Fit a real dataset with binomial counts; see Masarotto and Varin (2016).
library(gcmr)
data(malaria)
malariax <- data.frame(netuse = malaria$netuse,
green = malaria$green/100,
phc = malaria$phc)
Fitmalaria <- mlegc(y = malaria$cases, x = malariax, locs = malaria[,1:2],
marginal = binomial.gc(link='logit'), corrpar0 = 1.5,
corr = matern.gc(kappa = 0.5, nugget = FALSE),
distscale = 0.001, effort = malaria$size, method = 'GHK')
summary(Fitmalaria)
## Fit a real spatial binary dataset with 333 locations using probit link
data(OilWell)
Oilest1 <- mlegc(y = OilWell[,3], x = NULL, locs = OilWell[,1:2],
marginal = binomial.gc(link = 'probit'),
corr = matern.gc(nugget = TRUE), method = 'GHK')
summary(Oilest1)
plot(Oilest1, col = 2)
## End(Not run)