get.measures {boral}R Documentation

Information Criteria for models

Description

[Defunct]

Calculates some information criteria for a fitted model, which could be used for model selection. WARNING: As of version 1.6, this function is no longer maintained (and probably doesn't work properly, if at all)!

Usage

get.measures(y, X = NULL, family, trial.size = 1, row.eff = "none", 
	row.ids = NULL, offset = NULL, num.lv, fit.mcmc)

Arguments

y

The response matrix that the model was fitted to.

X

The covariate matrix used in the model. Defaults to NULL, in which case it is assumed no model matrix was used.

family

Either a single element, or a vector of length equal to the number of columns in the response matrix. The former assumes all columns of the response matrix come from this distribution. The latter option allows for different distributions for each column of the response matrix. Elements can be one of "binomial" (with probit link), "poisson" (with log link), "negative.binomial" (with log link), "normal" (with identity link), "lnormal" for lognormal (with log link), "tweedie" (with log link), "exponential" (with log link), "gamma" (with log link), "beta" (with logit link), "ordinal" (cumulative probit regression), "ztpoisson" (zero truncated Poisson with log link), "ztnegative.binomial" (zero truncated negative binomial with log link).

Please see about.distributions for information on distributions available in boral overall.

trial.size

Either equal to a single element, or a vector of length equal to the number of columns in y. If a single element, then all columns assumed to be binomially distributed will have trial size set to this. If a vector, different trial sizes are allowed in each column of y. The argument is ignored for all columns not assumed to be binomially distributed. Defaults to 1, i.e. Bernoulli distribution.

row.eff

Single element indicating whether row effects are included as fixed effects ("fixed"), random effects ("random") or not included ("none") in the fitted model. If fixed effects, then for parameter identifiability the first row effect is set to zero, which analogous to acting as a reference level when dummy variables are used. If random effects, they are drawn from a normal distribution with mean zero and estimated standard deviation. Defaults to "none".

row.ids

A matrix with the number of rows equal to the number of rows in the response matrix, and the number of columns equal to the number of row effects to be included in the model. Element (i,j) indicates the cluster ID of row i in the response matrix for random effect j; please see boral for details. Defaults to NULL, so that if row.eff = "none" then the argument is ignored, otherwise if
row.eff = "fixed" or "random",
then row.ids = matrix(1:nrow(y), ncol = 1) i.e., a single, row effect unique to each row.

offset

A matrix with the same dimensions as the response matrix, specifying an a-priori known component to be included in the linear predictor during fitting. Defaults to NULL.

num.lv

The number of latent variables used in the model.

fit.mcmc

All MCMC samples for the fitted model. These can be extracted by fitting a model using boral with save.model = TRUE, and then applying get.mcmcsamples(fit).

Details

The following information criteria are currently calculated, when permitted: 1) Widely Applicable Information Criterion (WAIC, Watanabe, 2010) based on the conditional log-likelihood; 2) expected AIC (EAIC, Carlin and Louis, 2011); 3) expected BIC (EBIC, Carlin and Louis, 2011); 4) AIC (using the marginal likelihood) evaluated at the posterior median; 5) BIC (using the marginal likelihood) evaluated at the posterior median.

1) WAIC has been argued to be more natural and extension of AIC to the Bayesian and hierarchical modeling context (Gelman et al., 2013), and is based on the conditional log-likelihood calculated at each of the MCMC samples.

2 & 3) EAIC and EBIC were suggested by (Carlin and Louis, 2011). Both criteria are of the form -2*mean(conditional log-likelihood) + penalty*(no. of parameters in the model), where the mean is averaged all the MCMC samples. EAIC applies a penalty of 2, while EBIC applies a penalty of log(n).

4 & 5) AIC and BIC take the form -2*(marginal log-likelihood) + penalty*(no. of parameters in the model), where the log-likelihood is evaluated at the posterior median. If the parameter-wise posterior distributions are unimodal and approximately symmetric, these will produce similar results to an AIC and BIC where the log-likelihood is evaluated at the posterior mode. EAIC applies a penalty of 2, while EBIC applies a penalty of log(n).

Intuitively, comparing models with and without latent variables (using information criteria such as those returned) amounts to testing whether the columns of the response matrix are correlated. With multivariate abundance data for example, where the response matrix comprises of n sites and p species, comparing models with and without latent variables tests whether there is any evidence of correlation between species.

Please note that criteria 4 and 5 are not calculated all the time. In models where traits are included in the model (such that the regression coefficients \beta_{0j}, \bm{\beta}_j are random effects), or more than two columns are ordinal responses (such that the intercepts \beta_{0j} for these columns are random effects), then criteria 4 and 5 are will not calculated. This is because the calculation of the marginal log-likelihood in such cases currently fail to marginalize over such random effects; please see the details in calc.logLik.lv0 and calc.marglogLik.

Value

A list with the following components:

waic

WAIC based on the conditional log-likelihood.

eaic

EAIC based on the mean of the conditional log-likelihood.

ebic

EBIC based on the mean of the conditional log-likelihood.

all.cond.logLik

The conditional log-likelihood evaluated at all MCMC samples. This is done via repeated application of calc.condlogLik.

cond.num.params

Number of estimated parameters used in the fitted model, when all parameters are treated as "fixed" effects.

do.marglik.ics

A boolean indicating whether marginal log-likelihood based information criteria are calculated.

If do.marglik.ics = TRUE, then we also have:

median.logLik

The marginal log-likelihood evaluated at the posterior median.

marg.num.params

Number of estimated parameters used in the fitted model, when all parameters are treated as "fixed" effects.

aic.median

AIC (using the marginal log-likelihood) evaluated at the posterior median.

bic.median

BIC (using the marginal log-likelihood) evaluated at the posterior median.

Warning

As of version 1.6, this function is no longer maintained (and probably doesn't work properly, if at all)!

Using information criterion for variable selection should be done with extreme caution, for two reasons: 1) The implementation of these criteria are both heuristic and experimental. 2) Deciding what model to fit for ordination purposes should be driven by the science. For example, it may be the case that a criterion suggests a model with 3 or 4 latent variables. However, if we interested in visualizing the data for ordination purposes, then models with 1 or 2 latent variables are far more appropriate. As an another example, whether or not we include row effects when ordinating multivariate abundance data depends on if we are interested in differences between sites in terms of relative species abundance (row.eff = FALSE) or in terms of species composition (row.eff = "fixed").

Also, the use of information criterion in the presence of variable selection using SSVS is questionable.

Note

When a model is fitted using boral with calc.ics = TRUE, then this function is applied and the information criteria are returned as part of the model output.

Author(s)

Francis K.C. Hui [aut, cre], Wade Blanchard [aut]

Maintainer: Francis K.C. Hui <fhui28@gmail.com>

References

See Also

get.dic for calculating the Deviance Information Criterion (DIC) based on the conditional log-likelihood; get.more.measures for even more information criteria.

Examples

## Not run: 
## NOTE: The values below MUST NOT be used in a real application;
## they are only used here to make the examples run quick!!!
example_mcmc_control <- list(n.burnin = 10, n.iteration = 100, 
     n.thin = 1)
     
testpath <- file.path(tempdir(), "jagsboralmodel.txt")


library(mvabund) ## Load a dataset from the mvabund package
data(spider)
y <- spider$abun
n <- nrow(y)
p <- ncol(y)

spiderfit_pois <- boral(y, family = "poisson", 
    lv.control = list(num.lv = 2), row.eff = "random",
    mcmc.control = example_mcmc_control)

spiderfit_pois$ics ## Returns information criteria

spiderfit_nb <- boral(y, family = "negative.binomial", 
    lv.control = list(num.lv = 2), row.eff = "random",
    mcmc.control = example_mcmc_control, model.name = testpath)

spiderfit_nb$ics ## Returns the information criteria 

## End(Not run)

[Package boral version 2.0.2 Index]