R: Monte Carlo Maximum Likelihood estimation for the binomial...

binomial.logistic.MCML {PrevMap}

R Documentation

Monte Carlo Maximum Likelihood estimation for the binomial logistic model

Description

This function performs Monte Carlo maximum likelihood (MCML) estimation for the geostatistical binomial logistic model.

Usage

binomial.logistic.MCML(
  formula,
  units.m,
  coords,
  times = NULL,
  data,
  ID.coords = NULL,
  par0,
  control.mcmc,
  kappa,
  kappa.t = NULL,
  sst.model = NULL,
  fixed.rel.nugget = NULL,
  start.cov.pars,
  method = "BFGS",
  low.rank = FALSE,
  SPDE = FALSE,
  knots = NULL,
  mesh = NULL,
  messages = TRUE,
  plot.correlogram = TRUE
)

Arguments

`formula`	an object of class `formula` (or one that can be coerced to that class): a symbolic description of the model to be fitted.
`units.m`	an object of class `formula` indicating the binomial denominators in the data.
`coords`	an object of class `formula` indicating the spatial coordinates in the data.
`times`	an object of class `formula` indicating the times in the data, used in the spatio-temporal model.
`data`	a data frame containing the variables in the model.
`ID.coords`	vector of ID values for the unique set of spatial coordinates obtained from `create.ID.coords`. These must be provided if, for example, spatial random effects are defined at household level but some of the covariates are at individual level. Warning: the household coordinates must all be distinct otherwise see `jitterDupCoords`. Default is `NULL`.
`par0`	parameters of the importance sampling distribution: these should be given in the following order `c(beta,sigma2,phi,tau2)`, where `beta` are the regression coefficients, `sigma2` is the variance of the Gaussian process, `phi` is the scale parameter of the spatial correlation and `tau2` is the variance of the nugget effect (if included in the model).
`control.mcmc`	output from `control.mcmc.MCML`.
`kappa`	fixed value for the shape parameter of the Matern covariance function.
`kappa.t`	fixed value for the shape parameter of the Matern covariance function in the separable double-Matern spatio-temporal model.
`sst.model`	a character value that specifies the spatio-temporal correlation function. `sst.model="DM"` separable double-Matern. `sst.model="GN1"` separable correlation functions. Temporal correlation: `f(x) = 1/(1+x/\psi)`; Spatial correaltion: Matern function. Deafault is `sst.model=NULL`, which is used when a purely spatial model is fitted.
`fixed.rel.nugget`	fixed value for the relative variance of the nugget effect; `fixed.rel.nugget=NULL` if this should be included in the estimation. Default is `fixed.rel.nugget=NULL`.
`start.cov.pars`	a vector of length two with elements corresponding to the starting values of `phi` and the relative variance of the nugget effect `nu2`, respectively, that are used in the optimization algorithm. If `nu2` is fixed through `fixed.rel.nugget`, then `start.cov.pars` represents the starting value for `phi` only.
`method`	method of optimization. If `method="BFGS"` then the `maxBFGS` function is used; otherwise `method="nlminb"` to use the `nlminb` function. Default is `method="BFGS"`.
`low.rank`	logical; if `low.rank=TRUE` a low-rank approximation of the Gaussian spatial process is used when fitting the model. Default is `low.rank=FALSE`.
`SPDE`	logical; if `SPDE=TRUE` the SPDE approximation for the Gaussian spatial model is used. Default is `SPDE=FALSE`.
`knots`	if `low.rank=TRUE`, `knots` is a matrix of spatial knots that are used in the low-rank approximation. Default is `knots=NULL`.
`mesh`	an object obtained as result of a call to the function `inla.mesh.2d`.
`messages`	logical; if `messages=TRUE` then status messages are printed on the screen (or output device) while the function is running. Default is `messages=TRUE`.
`plot.correlogram`	logical; if `plot.correlogram=TRUE` the autocorrelation plot of the samples of the random effect is displayed after completion of conditional simulation. Default is `plot.correlogram=TRUE`.

Details

This function performs parameter estimation for a geostatistical binomial logistic model. Conditionally on a zero-mean stationary Gaussian process S(x) and mutually independent zero-mean Gaussian variables Z with variance tau2, the observations y are generated from a binomial distribution with probability p and binomial denominators units.m. A canonical logistic link is used, thus the linear predictor assumes the form

\log(p/(1-p)) = d'\beta + S(x) + Z,

where d is a vector of covariates with associated regression coefficients \beta. The Gaussian process S(x) has isotropic Matern covariance function (see matern) with variance sigma2, scale parameter phi and shape parameter kappa. In the binomial.logistic.MCML function, the shape parameter is treated as fixed. The relative variance of the nugget effect, nu2=tau2/sigma2, can also be fixed through the argument fixed.rel.nugget; if fixed.rel.nugget=NULL, then the relative variance of the nugget effect is also included in the estimation.

Monte Carlo Maximum likelihood. The Monte Carlo maximum likelihood method uses conditional simulation from the distribution of the random effect T(x) = d(x)'\beta+S(x)+Z given the data y, in order to approximate the high-dimensiional intractable integral given by the likelihood function. The resulting approximation of the likelihood is then maximized by a numerical optimization algorithm which uses analytic epression for computation of the gradient vector and Hessian matrix. The functions used for numerical optimization are maxBFGS (method="BFGS"), from the maxLik package, and nlminb (method="nlminb").

Using a two-level model to include household-level and individual-level information. When analysing data from household sruveys, some of the avilable information information might be at household-level (e.g. material of house, temperature) and some at individual-level (e.g. age, gender). In this case, the Gaussian spatial process S(x) and the nugget effect Z are defined at hosuehold-level in order to account for extra-binomial variation between and within households, respectively.

Low-rank approximation. In the case of very large spatial data-sets, a low-rank approximation of the Gaussian spatial process S(x) might be computationally beneficial. Let (x_{1},\dots,x_{m}) and (t_{1},\dots,t_{m}) denote the set of sampling locations and a grid of spatial knots covering the area of interest, respectively. Then S(x) is approximated as \sum_{i=1}^m K(\|x-t_{i}\|; \phi, \kappa)U_{i}, where U_{i} are zero-mean mutually independent Gaussian variables with variance sigma2 and K(.;\phi, \kappa) is the isotropic Matern kernel (see matern.kernel). Since the resulting approximation is no longer a stationary process (but only approximately), the parameter sigma2 is then multiplied by a factor constant.sigma2 so as to obtain a value that is closer to the actual variance of S(x).

Value

An object of class "PrevMap". The function summary.PrevMap is used to print a summary of the fitted model. The object is a list with the following components:

estimate: estimates of the model parameters; use the function coef.PrevMap to obtain estimates of covariance parameters on the original scale.

covariance: covariance matrix of the MCML estimates.

log.lik: maximum value of the log-likelihood.

y: binomial observations.

units.m: binomial denominators.

D: matrix of covariates.

coords: matrix of the observed sampling locations.

method: method of optimization used.

ID.coords: set of ID values defined through the argument ID.coords.

kappa: fixed value of the shape parameter of the Matern function.

kappa.t: fixed value for the shape parameter of the Matern covariance function in the separable double-Matern spatio-temporal model.

knots: matrix of the spatial knots used in the low-rank approximation.

mesh: the mesh used in the SPDE approximation.

const.sigma2: adjustment factor for sigma2 in the low-rank approximation.

h: vector of the values of the tuning parameter at each iteration of the Langevin-Hastings MCMC algorithm; see Laplace.sampling, or Laplace.sampling.lr if a low-rank approximation is used.

samples: matrix of the random effects samples from the importance sampling distribution used to approximate the likelihood function.

fixed.rel.nugget: fixed value for the relative variance of the nugget effect.

call: the matched call.

Author(s)

Emanuele Giorgi e.giorgi@lancaster.ac.uk

Peter J. Diggle p.diggle@lancaster.ac.uk

References

Diggle, P.J., Giorgi, E. (2019). Model-based Geostatistics for Global Public Health. CRC/Chapman & Hall.

Giorgi, E., Diggle, P.J. (2017). PrevMap: an R package for prevalence mapping. Journal of Statistical Software. 78(8), 1-29. doi: 10.18637/jss.v078.i08

Christensen, O. F. (2004). Monte carlo maximum likelihood in model-based geostatistics. Journal of Computational and Graphical Statistics 13, 702-718.

Higdon, D. (1998). A process-convolution approach to modeling temperatures in the North Atlantic Ocean. Environmental and Ecological Statistics 5, 173-190.