R: Correlation structure for latent variables

about.lvs {boral}

R Documentation

Correlation structure for latent variables

Description

This help file provides more information how (non-independence) correlation structures can be assumed for latent variables.

Details

In the main boral function, when latent varaibles are included, the default option is to assume that the latent variables are independent across the rows (sites) of the response matrix i.e., lv.type = "independent". That is, \bm{u}_i \sim N(\bm{0},\bm{I}_d) where d = num.lv. This is useful when we want to model between species correlations (is a parsimonious manner), but it does make an assumption that sites are independent.

If one a-priori believes that the sites are, in fact, correlated e.g., due to spatial correlation, and that it cannot be sufficiently well accounted for by row effects (see comment below), then we can account for this by assuming a non-independence correlation structure for the the latent variables across sites. Note however we continue to assume that the d latent variables are still independent of one another. That is, if we let \bm{u}_i = (u_{i1}, \ldots, u_{id}), then we assume that for l = 1,\ldots,d,

(u_{1l}, u_{2l}, \ldots, u_{nl}) \sim N(\bm{0}, \bm{\Sigma}),

where \bm{\Sigma} is some correlation matrix. When \bm{\Sigma} = \bm{I}_n then we are back in the independence case. However, if we allow for the off-diagonals to be non-zero, then we the latent variables to be correlated, \Sigma_{ij} = Cov(u_{il}, u_{jl}). This in turn induces correlation across sites and species i.e., two species at two different sites are now correlated because of the correlation across sites.

While there are fancier structures and attempts at accounting for correlations between sites (Cressie and Wikle, 2015), in boral we assume relatively simple structures. Specifically, we can assume that sites further away are less correlated, and so \Sigma can be characterized based on a distance matrix distmat and associated spatial covariance parameters which require estimation. Indeed, such simple spatial latent variable models have become rather popular in community ecology of late, at least as a first attempt at accounting for spatial (and also temporal) correlation e.g., Thorson et al., (2015, 2016); Ovaskainen et al., (2016, 2017).

At the moment, several correlation structures are permitted. Let D_{ij} denote the distance between site i and j i.e., entry (i,j) in distmat. Also, let (\vartheta_1,\vartheta_2) denote the two spatial covariance parameters (noting that the second parameter is not required for some of structures). Then we have: 1) lv.type = "exponential" such that \Sigma_{ij} = \exp(-D_{ij}/\vartheta_1); 2) lv.type = "squared.exponential", such that \Sigma_{ij} = \exp(-D_{ij}/\vartheta_1^2); 3) lv.type = "power.exponential", such that \Sigma_{ij} = \exp(-(D_{ij}/\vartheta_1)^{\vartheta_2}) where \vartheta_1 \in (0,2] ; 4) lv.type = "spherical", such that (D_{ij} < \vartheta_1)*(1 - 1.5*D_{ij}/\vartheta_1 + 0.5*(D_{ij}/\vartheta_1)^3). We refer the reader to the geoR and the function cov.spatial for more, simple information on spatial covariance functions (Ribeiro Jr and Diggle, 2016).

It is important to keep in mind that moving away from an independence correlation structure for the latent variables massively increases computation time for MCMC sampling (and indeed any estimation method for latent variable models). Given JAGS is not the fastest of methods when it comes to MCMC sampling, then one should be cautious about moving away from indepndence. For example, if you a-priori have a nested experimental design which is inducing spatial correlation, then it is much faster and more effective to include (multiple) row effects in the model to account for this spatial correlation instead.

Author(s)

Francis K.C. Hui [aut, cre], Wade Blanchard [aut]

Maintainer: Francis K.C. Hui <fhui28@gmail.com>

References

Cressie, N. and Wikle, C. K. (2015) Statistics for Spatio-temporal Data. John Wiley & Sons.
Ovaskainen et al. (2016). Uncovering hidden spatial structure in species communities with spatially explicit joint species distribution models. Methods in Ecology and Evolution, 7, 428-436.
Ovaskainen et al. (2017). How to make more out of community data? A conceptual framework and its implementation as models and software. Ecology Letters, 20, 561-576.
Ribeiro Jr, P. J., and Diggle P. J., (2016). geoR: Analysis of Geostatistical Data. R package version 1.7-5.2. https://CRAN.R-project.org/package=geoR.
Thorson et al. (2016). Joint dynamic species distribution models: a tool for community ordination and spatio-temporal monitoring. Global Ecology and Biogeography, 25, 1144-1158
Thorson et al. (2015). Spatial factor analysis: a new tool for estimating joint species distributions and correlations in species range. Methods in Ecology and Evolution, 6, 627-63

Examples

library(mvabund) ## Load a dataset from the mvabund package
data(spider)
y <- spider$abun
X <- scale(spider$x)
n <- nrow(y)
p <- ncol(y)

## NOTE: The example below is taken directly from the boral help file

example_mcmc_control <- list(n.burnin = 10, n.iteration = 100, 
     n.thin = 1)

testpath <- file.path(tempdir(), "jagsboralmodel.txt")

## Not run: 
## Example 2d - model with environmental covariates and 
##  two structured latent variables using fake distance matrix
fakedistmat <- as.matrix(dist(1:n))
spiderfit_lvstruc <- boral(y, X = X, family = "negative.binomial", 
    lv.control = list(num.lv = 2, type = "exponential", distmat = fakedistmat), 
     mcmc.control = example_mcmc_control, model.name = testpath)

summary(spiderfit_lvstruc)

## End(Not run)