R: Fit a (scalable) spatial multivariate Poisson mixed model to...

MCAR_INLA {bigDM}

R Documentation

Fit a (scalable) spatial multivariate Poisson mixed model to areal count data where dependence between spatial patterns of the diseases is addressed through the use of M-models (Botella-Rocamora et al. 2015).

Description

Fit a spatial multivariate Poisson mixed model to areal count data. The linear predictor is modelled as

\log{r_{ij}}=\alpha_j + \theta_{ij}, \quad \mbox{for} \quad i=1,\ldots,n; \quad j=1,\ldots,J

where \alpha_j is a disease-specific intercept and \theta_{ij} is the spatial main effect of area i for the j-th disease. Following Botella-Rocamora et al. (2015), we rearrange the spatial effects into the matrix \mathbf{\Theta} = \{ \theta_{ij}: i=1, \ldots, I; j=1, \ldots, J \} whose columns are spatial random effects and its joint distribution specifies how dependence within-diseases and between-diseases is defined. Several conditional autoregressive (CAR) prior distributions can be specified to deal with spatial dependence within-diseases, such as the intrinsic CAR prior (Besag et al. 1991), the CAR prior proposed by Leroux et al. (1999), and the proper CAR prior distribution.

As in the CAR_INLA function, three main modelling approaches can be considered:

the usual model with a global spatial random effect whose dependence structure is based on the whole neighbourhood graph of the areal units (model="global" argument)
a Disjoint model based on a partition of the whole spatial domain where independent spatial CAR models are simultaneously fitted in each partition (model="partition" and k=0 arguments)
a modelling approach where k-order neighbours are added to each partition to avoid border effects in the Disjoint model (model="partition" and k>0 arguments).

For both the Disjoint and k-order neighbour models, parallel or distributed computation strategies can be performed to speed up computations by using the 'future' package (Bengtsson 2021).

Inference is conducted in a fully Bayesian setting using the integrated nested Laplace approximation (INLA; Rue et al. (2009)) technique through the R-INLA package (https://www.r-inla.org/). For the scalable model proposals (Orozco-Acosta et al. 2021), approximate values of the Deviance Information Criterion (DIC) and Watanabe-Akaike Information Criterion (WAIC) can also be computed.

The function allows also to use the new hybrid approximate method that combines the Laplace method with a low-rank Variational Bayes correction to the posterior mean (van Niekerk et al. 2023) by including the inla.mode="compact" argument.

Usage

MCAR_INLA(
  carto = NULL,
  data = NULL,
  ID.area = NULL,
  ID.disease = NULL,
  ID.group = NULL,
  O = NULL,
  E = NULL,
  W = NULL,
  prior = "intrinsic",
  model = "partition",
  k = 0,
  strategy = "simplified.laplace",
  merge.strategy = "original",
  compute.intercept = NULL,
  compute.DIC = TRUE,
  n.sample = 1000,
  compute.fitted.values = FALSE,
  save.models = FALSE,
  plan = "sequential",
  workers = NULL,
  inla.mode = "classic",
  num.threads = NULL
)

Arguments

`carto`	object of class `SpatialPolygonsDataFrame` or `sf`. This object must contain at least the variable with the identifiers of the spatial areal units specified in the argument `ID.area`.
`data`	object of class `data.frame` that must contain the target variables of interest specified in the arguments `ID.area`, `ID.disease`, `O` and `E`.
`ID.area`	character; name of the variable that contains the IDs of spatial areal units. The values of this variable must match those given in the `carto` and `data` variable.
`ID.disease`	character; name of the variable that contains the IDs of the diseases.
`ID.group`	character; name of the variable that contains the IDs of the spatial partition (grouping variable). Only required if `model="partition"`.
`O`	character; name of the variable that contains the observed number of cases for each areal unit and disease.
`E`	character; name of the variable that contains either the expected number of cases or the population at risk for each areal unit and disease.
`W`	optional argument with the binary adjacency matrix of the spatial areal units. If `NULL` (default), this object is computed from the `carto` argument (two areas are considered as neighbours if they share a common border).
`prior`	one of either `"intrinsic"` (default), `"Leroux"`, `"proper"`, or `"iid"` which specifies the prior distribution considered for the spatial random effect.
`model`	one of either `"global"` or `"partition"` (default), which specifies the Global model or one of the scalable model proposal's (Disjoint model and k-order neighbourhood model, respectively).
`k`	numeric value with the neighbourhood order used for the partition model. Usually k=2 or 3 is enough to get good results. If k=0 (default) the Disjoint model is considered. Only required if `model="partition"`.
`strategy`	one of either `"gaussian"`, `"simplified.laplace"` (default), `"laplace"` or `"adaptive"`, which specifies the approximation strategy considered in the `inla` function.
`merge.strategy`	one of either `"mixture"` or `"original"` (default), which specifies the merging strategy to compute posterior marginal estimates of relative risks. See `mergeINLA` for further details.
`compute.intercept`	CAUTION! This argument is deprecated from version 0.5.2.
`compute.DIC`	logical value; if `TRUE` (default) then approximate values of the Deviance Information Criterion (DIC) and Watanabe-Akaike Information Criterion (WAIC) are computed.
`n.sample`	numeric; number of samples to generate from the posterior marginal distribution of the linear predictor when computing approximate DIC/WAIC values. Default to 1000.
`compute.fitted.values`	logical value (default `FALSE`); if `TRUE` transforms the posterior marginal distribution of the linear predictor to the exponential scale (risks or rates).
`save.models`	logical value (default `FALSE`); if `TRUE` then a list with all the `inla` submodels is saved in '/temp/' folder, which can be used as input argument for the `mergeINLA` function.
`plan`	one of either `"sequential"` or `"cluster"`, which specifies the computation strategy used for model fitting using the 'future' package. If `plan="sequential"` (default) the models are fitted sequentially and in the current R session (local machine). If `plan="cluster"` the models are fitted in parallel on external R sessions (local machine) or distributed in remote compute nodes.
`workers`	character or vector (default `NULL`) containing the identifications of the local or remote workers where the models are going to be processed. Only required if `plan="cluster"`.
`inla.mode`	one of either `"classic"` (default) or `"compact"`, which specifies the approximation method used by INLA. See `help(inla)` for further details.
`num.threads`	maximum number of threads the inla-program will use. See `help(inla)` for further details.

Details

For a full model specification and further details see the vignettes accompanying this package.

Value

This function returns an object of class inla. See the mergeINLA function for details.

References

Bengtsson H (2021). “A unifying framework for parallel and distributed processing in R using futures.” The R Journal, 13(2), 273–291. doi:10.32614/RJ-2021-048.

Besag J, York J, Mollié A (1991). “Bayesian image restoration, with two applications in spatial statistics.” Annals of the Institute of Statistical Mathematics, 43(1), 1–20. doi:10.1007/bf00116466.

Botella-Rocamora P, Martinez-Beneito MA, Banerjee S (2015). “A unifying modeling framework for highly multivariate disease mapping.” Statistics in Medicine, 34(9), 1548–1559. doi:10.1002/sim.6423.

Leroux BG, Lei X, Breslow N (1999). “Estimation of disease rates in small areas: A new mixed model for spatial dependence.” In Halloran ME, Berry D (eds.), Statistical Models in Epidemiology, the Environment, and Clinical Trials, 179–191. Springer-Verlag: New York.

Rue H, Martino S, Chopin N (2009). “Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71(2), 319–392. doi:10.1111/j.1467-9868.2008.00700.x.

Vicente G, Adin A, Goicoa T, Ugarte MD (2023). “High-dimensional order-free multivariate spatial disease mapping.” Statistics and Computing, 33(5), 104. doi:10.1007/s11222-023-10263-x.

van Niekerk J, Krainski E, Rustand D, Rue H (2023). “A new avenue for Bayesian inference with INLA.” Computational Statistics & Data Analysis, 181, 107692. doi:10.1016/j.csda.2023.107692.

Examples

## Not run: 
if(require("INLA", quietly=TRUE)){

  ## Load the sf object that contains the spatial polygons of the municipalities of Spain ##
  data(Carto_SpainMUN)
  str(Carto_SpainMUN)

  ## Load the simulated cancer mortality data (three diseases) ##
  data(Data_MultiCancer)
  str(Data_MultiCancer)

  ## Fit the Global model with an iCAR prior for the within-disease random effects ##
  Global <- MCAR_INLA(carto=Carto_SpainMUN, data=Data_MultiCancer,
                      ID.area="ID", ID.disease="disease", O="obs", E="exp",
                      prior="intrinsic", model="global", strategy="gaussian")
  summary(Global)

  ## Fit the Disjoint model with an iCAR prior for the within-disease random effects ##
  ## using 4 local clusters to fit the models in parallel                            ##
  Disjoint <- MCAR_INLA(carto=Carto_SpainMUN, data=Data_MultiCancer,
                        ID.area="ID", ID.disease="disease", O="obs", E="exp", ID.group="region",
                        prior="intrinsic", model="partition", k=0, strategy="gaussian",
                        plan="cluster", workers=rep("localhost",4))
  summary(Disjoint)

  ## 1st-order neighbourhood model with an iCAR prior for the within-disease random effects ##
  ## using 4 local clusters to fit the models in parallel                                   ##
  order1 <- MCAR_INLA(carto=Carto_SpainMUN, data=Data_MultiCancer,
                      ID.area="ID", ID.disease="disease", O="obs", E="exp", ID.group="region",
                      prior="intrinsic", model="partition", k=1, strategy="gaussian",
                      plan="cluster", workers=rep("localhost",4))
  summary(order1)
}

## End(Not run)

[Package bigDM version 0.5.4 Index]