predict.boral {boral} | R Documentation |

## Predict using a model

### Description

Construct predictions and associated intervals (lower and upper limits) from a fitted boral object. Predictions can be made either conditionally on the predicted latent variables and any random row effects/response-specific random intercepts included in the model, or marginally (averaged) on the latent variables and these other effects (note integration is done on the linear predictor scale).

### Usage

```
## S3 method for class 'boral'
predict(object, newX = NULL, newrow.ids = NULL, newranef.ids = NULL,
distmat = NULL, predict.type = "conditional", scale = "link",
est = "median", prob = 0.95, lv.mc = 1000, return.alllinpred = FALSE,
...)
```

### Arguments

`object` |
An object of class "boral". |

`newX` |
An optional model matrix of covariates for extrapolation to the same sites (under different environmental conditions) or extrapolation to new sites. No intercept column should be included in |

`newrow.ids` |
An optional matrix with the number of columns equal to the number of row effects included in the model. Element |

`newranef.ids` |
An optional matrix with the number of columns equal to the number of response-specific random intercepts included in the model. Element |

`distmat` |
A distance matrix required to calculate correlations across sites when a non-independence correlation structure on the latent variables is imposed. |

`predict.type` |
The type of prediction to be made. Either takes value |

`scale` |
The type of prediction required. The default is on the scale of the linear predictors; the alternative Note things are slightly more complicated for zero truncated distributions because the log-link connects the mean of the |

`est` |
A choice of either whether to print the posterior median ( |

`prob` |
A numeric scalar in the interval (0,1) giving the target probability coverage of the intervals. Defaults to 0.95. |

`lv.mc` |
If the predictions are made marginalizing over the latent variables, then number of Monte-Carlo samples to take when performing the relevant integration. |

`return.alllinpred` |
If |

`...` |
Not used. |

### Details

In the Bayesian MCMC framework, predictions are based around the posterior predictive distribution, which is the integral of the quantity one wants to predict on, integrated or averaged over the posterior distribution of the parameters and latent variables. For example, on the linear predictor scale, predictions are made as,

`\eta_{ij} = \alpha_i + \beta_{0j} + \bm{x}^\top_i\bm{\beta}_j + \bm{z}^\top_i\bm{b}_j + \bm{u}^\top_i\bm{\theta}_j; \quad i = 1,\ldots,n; j = 1,\ldots,p,`

where `\beta_{0j} + \bm{x}^\top_i\bm{\beta}_j`

is the component of the linear predictor due to the covariates `\bm{X}`

plus an intercept, `\bm{z}^\top_i\bm{b}_j`

is the component due to response-specific random intercept, `\bm{u}^\top_i\bm{\theta}_j`

is the component due to the latent variables, and `\alpha_i`

is the component due to one or more fixed or random row effects. Not all of these components may be included in the model, and the above is just representing the general case.

Note that for the above to work, one must have saved the MCMC samples in the fitted boral object, that is, set `save.model = TRUE`

when fitting.

Two types of predictions are possible using this function:

The first type is

`predict.type = "conditional"`

, meaning predictions are made conditionally on the predicted latent variables and any (random) row effects and response-specific random intercepts included in the model. This is mainly used when predictions are made onto the*same*set of sites that the model was fitted to, although a`newX`

can be supplied in this case if we want to extrapolate on to the same set of sites but under different environmental conditions.The second type of prediction is

`predict.type = "marginal"`

, meaning predictions are made marginally or averaging over the latent variables and any (random) row effects and response-specific random intercepts included in the model. This is mainly used when predictions are made onto a*new*set of sites where the latent variables/row effects/response-specific random intercepts are unknown. Consequently, arguments`newX`

,`newrow.ids`

and`newranef.ids`

are often supplied in such a setting since we are extrapolating to new observational units. The integration over the latent variables and random row effects is done via Monte-Carlo integration. Please note however that the integration is done on the linear predictor scale.

More information on conditional versus marginal predictions in latent variable models can be found in Warton et al., (2015). In both cases, and if `return.alllinpred = FALSE`

, the function returns a point prediction (either the posterior mean or median depending on `est`

) and the lower and upper bounds of a (100`\times`

`prob`

) % interval of the posterior prediction. All of these quantities are calculated empirically based across the MCMC samples.

### Value

A list containing the following components:

`linpred` |
A matrix containing posterior point predictions (either posterior mean or median depending on |

`lower` |
A matrix containing the lower bound of the (100 |

`upper` |
A matrix containing the upper bound of the (100 |

`all.linpred` |
If |

### Warnings

Marginal predictions can take quite a while to construct due to the need to perform Monte-Carlo integration to marginalize over the latent variables and any random row effects in the model.

### Author(s)

Francis K.C. Hui [aut, cre], Wade Blanchard [aut]

Maintainer: Francis K.C. Hui <fhui28@gmail.com>

### References

Gelman et al. (2013) Bayesian Data Analysis. CRC Press.

Warton et al. (2015). So Many Variables: Joint Modeling in Community Ecology. Trends in Ecology and Evolution, 30, 766-779.

### Examples

```
## Not run:
library(mvabund) ## Load a dataset from the mvabund package
library(mvtnorm)
data(spider)
y <- spider$abun
X <- scale(spider$x)
n <- nrow(y)
p <- ncol(y)
## NOTE: The values below MUST NOT be used in a real application;
## they are only used here to make the examples run quick!!!
example_mcmc_control <- list(n.burnin = 10, n.iteration = 100,
n.thin = 1)
testpath <- file.path(tempdir(), "jagsboralmodel.txt")
## Example 1 - model with two latent variables, random site effects,
## and environmental covariates
spiderfit_nb <- boral(y, X = X, family = "negative.binomial",
row.eff = "random", lv.control = list(num.lv = 2),
mcmc.control = example_mcmc_control, save.model = TRUE,
model.name = testpath)
## Predictions conditional on predicted latent variables
getcondpreds <- predict(spiderfit_nb)
## Predictions marginal on latent variables, random row effects
## The intervals for these will generally be wider than the
## conditional intervals.
getmargpreds <- predict(spiderfit_nb, predict.type = "marginal")
## Now suppose you extrpolate to new sites
newX <- rmvnorm(100, mean = rep(0,ncol(X)))
## Below won't work since conditional predictions are made to the same sites
#getcondpreds <- predict(spiderfit_nb, newX = newX)
## Marginal predictions will work though, provided newrow.ids is set up
## properly. For example,
new_row_ids <- matrix(sample(1:28,100,replace=TRUE), 100, 1)
while(length(table(new_row_ids)) != 28) {
new_row_ids <- matrix(sample(1:28,100,replace=TRUE), 100, 1)
}
getmargpreds <- predict(spiderfit_nb, newX = newX, predict.type = "marginal",
newrow.ids = new_row_ids)
## Example 1b - Similar to 1 except with no random site effects,
## and a non-independence correlation structure for the latent variables
## based on a fake distance matrix
fakedistmat <- as.matrix(dist(1:n))
spiderfit_nb <- boral(y, X = X, family = "negative.binomial",
lv.control = list(type = "squared.exponential", num.lv = 2,
distmat = fakedistmat), model.name = testpath,
mcmc.control = example_mcmc_control, save.model = TRUE)
getmargpreds <- predict(spiderfit_nb, predict.type = "marginal",
distmat = fakedistmat)
## Now suppose you extrpolate to new sites
newfakedistmat <- as.matrix(dist(1:100))
getmargpreds <- predict(spiderfit_nb, newX = newX,
predict.type = "marginal", distmat = newfakedistmat)
## Example 1c - similar to 1 except instead of random site effects,
## there are species-specific random intercepts at a so-called
## "region" level
y <- spider$abun
X <- scale(spider$x)
spiderfit_nb <- boral(y, X = X, family = "negative.binomial",
lv.control = list(num.lv = 2),
ranef.ids = data.frame(region = rep(1:7,each=4)),
mcmc.control = example_mcmc_control, model.name = testpath,
save.model = TRUE)
## Predictions conditional on predicted latent variables and
## random intercepts
getcondpreds <- predict(spiderfit_nb)
## Predictions marginal on latent variables, random intercepts
## The intervals for these will generally be wider than the
## conditional intervals.
getmargpreds <- predict(spiderfit_nb, predict.type = "marginal")
## Now suppose you extrpolate to new sites
newX <- rmvnorm(100, mean = rep(0,ncol(X)))
## Marginal predictions will work though, provided newranef.ids is set up
## properly. For example,
new_ranef_ids <- matrix(sample(1:7,100,replace=TRUE), 100, 1)
getmargpreds <- predict(spiderfit_nb, newX = newX, predict.type = "marginal",
newranef.ids = new_ranef_ids)
## Example 2 - simulate count data, based on a model with two latent variables,
## no site variables, with two traits and one environmental covariates
library(mvtnorm)
n <- 100; s <- 50
X <- as.matrix(scale(1:n))
colnames(X) <- c("elevation")
traits <- cbind(rbinom(s,1,0.5), rnorm(s))
## one categorical and one continuous variable
colnames(traits) <- c("thorns-dummy","SLA")
simfit <- list(true.lv = rmvnorm(n, mean = rep(0,2)),
lv.coefs = cbind(rnorm(s), rmvnorm(s, mean = rep(0,2)), 1),
traits.coefs = matrix(c(0.1,1,-0.5,0.1,0.5,0,-1,0.1), 2, byrow = TRUE))
rownames(simfit$traits.coefs) <- c("beta0","elevation")
colnames(simfit$traits.coefs) <- c("kappa0","thorns-dummy","SLA","sigma")
simy = create.life(true.lv = simfit$true.lv, lv.coefs = simfit$lv.coefs, X = X,
traits = traits, traits.coefs = simfit$traits.coefs, family = "normal")
example_which_traits <- vector("list",ncol(X)+1)
for(i in 1:length(example_which_traits))
example_which_traits[[i]] <- 1:ncol(traits)
fit_traits <- boral(y = simy, X = X, traits = traits,
which.traits = example_which_traits, family = "normal",
lv.control = list(num.lv = 2), save.model = TRUE,
mcmc.control = example_mcmc_control,
model.name = testpath)
## Predictions conditional on predicted latent variables
getcondpreds <- predict(fit_traits)
## Predictions marginal on latent variables
## The intervals for these will generally be wider than the
## conditional intervals.
getmargpreds <- predict(fit_traits, predict.type = "marginal")
## End(Not run)
```

*boral*version 2.0.2 Index]