R: Extract information on default prior distributions for an...

get_mvgam_priors {mvgam}

R Documentation

Extract information on default prior distributions for an mvgam model

Description

This function lists the parameters that can have their prior distributions changed for a given mvgam model, as well listing their default distributions

Usage

get_mvgam_priors(
  formula,
  trend_formula,
  data,
  data_train,
  family = "poisson",
  knots,
  use_lv = FALSE,
  n_lv,
  use_stan = TRUE,
  trend_model = "None",
  trend_map,
  drift = FALSE
)

Arguments

`formula`	A `character` string specifying the GAM observation model formula. These are exactly like the formula for a GLM except that smooth terms, `s()`, `te()`, `ti()`, `t2()`, as well as time-varying `dynamic()` terms, can be added to the right hand side to specify that the linear predictor depends on smooth functions of predictors (or linear functionals of these). In `nmix()` family models, the `formula` is used to set up a linear predictor for the detection probability. Details of the formula syntax used by mvgam can be found in `mvgam_formulae`
`trend_formula`	An optional `character` string specifying the GAM process model formula. If supplied, a linear predictor will be modelled for the latent trends to capture process model evolution separately from the observation model. Should not have a response variable specified on the left-hand side of the formula (i.e. a valid option would be `~ season + s(year)`). Also note that you should not use the identifier `series` in this formula to specify effects that vary across time series. Instead you should use `trend`. This will ensure that models in which a `trend_map` is supplied will still work consistently (i.e. by allowing effects to vary across process models, even when some time series share the same underlying process model). This feature is only currently available for `RW()`, `AR()` and `VAR()` trend models. In `nmix()` family models, the `trend_formula` is used to set up a linear predictor for the underlying latent abundance
`data`	A `dataframe` or `list` containing the model response variable and covariates required by the GAM `formula` and optional `trend_formula`. Should include columns: #' `series` (a `factor` index of the series IDs; the number of levels should be identical to the number of unique series labels (i.e. `n_series = length(levels(data$series))`)) `time` (`numeric` or `integer` index of the time point for each observation). For most dynamic trend types available in `mvgam` (see argument `trend_model`), time should be measured in discrete, regularly spaced intervals (i.e. `c(1, 2, 3, ...)`). However you can use irregularly spaced intervals if using `trend_model = CAR(1)`, though note that any temporal intervals that are exactly `0` will be adjusted to a very small number (`1e-12`) to prevent sampling errors. See an example of `CAR()` trends in `CAR` Should also include any other variables to be included in the linear predictor of `formula`
`data_train`	Deprecated. Still works in place of `data` but users are recommended to use `data` instead for more seamless integration into `R` workflows
`family`	`family` specifying the exponential observation family for the series. Currently supported families are: `gaussian()` for real-valued data `betar()` for proportional data on `⁠(0,1)⁠` `lognormal()` for non-negative real-valued data `student_t()` for real-valued data `Gamma()` for non-negative real-valued data `bernoulli()` for binary data `poisson()` for count data `nb()` for overdispersed count data `binomial()` for count data with imperfect detection when the number of trials is known; note that the `cbind()` function must be used to bind the discrete observations and the discrete number of trials `beta_binomial()` as for `binomial()` but allows for overdispersion `nmix()` for count data with imperfect detection when the number of trials is unknown and should be modeled via a State-Space N-Mixture model. The latent states are Poisson, capturing the 'true' latent abundance, while the observation process is Binomial to account for imperfect detection. See `mvgam_families` for an example of how to use this family Note that only `nb()` and `poisson()` are available if using `JAGS` as the backend. Default is `poisson()`. See `mvgam_families` for more details
`knots`	An optional `list` containing user specified knot values to be used for basis construction. For most bases the user simply supplies the knots to be used, which must match up with the k value supplied (note that the number of knots is not always just `k`). Different terms can use different numbers of knots, unless they share a covariate
`use_lv`	`logical`. If `TRUE`, use dynamic factors to estimate series' latent trends in a reduced dimension format. Only available for `RW()`, `AR()` and `GP()` trend models. Defaults to `FALSE`
`n_lv`	`integer` the number of latent dynamic factors to use if `use_lv == TRUE`. Cannot be `> n_series`. Defaults arbitrarily to `min(2, floor(n_series / 2))`
`use_stan`	Logical. If `TRUE`, the model will be compiled and sampled using Hamiltonian Monte Carlo with a call to `cmdstan_model` or a call to `stan`. Note that there are many more options when using `Stan` vs `JAGS`
`trend_model`	`character` or `function` specifying the time series dynamics for the latent trend. Options are: `None` (no latent trend component; i.e. the GAM component is all that contributes to the linear predictor, and the observation process is the only source of error; similarly to what is estimated by `gam`) `'RW'` or `RW()` `'AR1'` or `AR(p = 1)` `'AR2'` or `AR(p = 2)` `'AR3'` or `AR(p = 3)` `'CAR1'` or `CAR(p = 1)` `'VAR1'` or `VAR()`(only available in `Stan`) `⁠'PWlogistic⁠`, `'PWlinear'` or `PW()` (only available in `Stan`) `'GP'` or `GP()` (Gaussian Process with squared exponential kernel; only available in `Stan`) For all trend types apart from `GP()`, `CAR()` and `PW()`, moving average and/or correlated process error terms can also be estimated (for example, `RW(cor = TRUE)` will set up a multivariate Random Walk if `n_series > 1`). See mvgam_trends for more details
`trend_map`	Optional `data.frame` specifying which series should depend on which latent trends. Useful for allowing multiple series to depend on the same latent trend process, but with different observation processes. If supplied, a latent factor model is set up by setting `use_lv = TRUE` and using the mapping to set up the shared trends. Needs to have column names `series` and `trend`, with integer values in the `trend` column to state which trend each series should depend on. The `series` column should have a single unique entry for each series in the data (names should perfectly match factor levels of the `series` variable in `data`). See examples for details
`drift`	`logical` estimate a drift parameter in the latent trend components. Useful if the latent trend is expected to broadly follow a non-zero slope. Only available for `RW()` and `AR()` trend models. Note that if the latent trend is more or less stationary, the drift parameter can become unidentifiable, especially if an intercept term is included in the GAM linear predictor (which it is by default when calling `jagam`). Drift parameters will also likely be unidentifiable if using dynamic factor models. Therefore this defaults to `FALSE`

Details

Users can supply a model formula, prior to fitting the model, so that default priors can be inspected and altered. To make alterations, change the contents of the prior column and supplying this data.frame to the mvgam function using the argument priors. If using Stan as the backend, users can also modify the parameter bounds by modifying the new_lowerbound and/or new_upperbound columns. This will be necessary if using restrictive distributions on some parameters, such as a Beta distribution for the trend sd parameters for example (Beta only has support on (0,1)), so the upperbound cannot be above 1. Another option is to make use of the prior modification functions in brms (i.e. prior) to change prior distributions and bounds (just use the name of the parameter that you'd like to change as the class argument; see examples below)

Value

either a data.frame containing the prior definitions (if any suitable priors can be altered by the user) or NULL, indicating that no priors in the model can be modified through the mvgam interface

Note

Only the prior, new_lowerbound and/or new_upperbound columns of the output should be altered when defining the user-defined priors for the mvgam model. Use only if you are familiar with the underlying probabilistic programming language. There are no sanity checks done to ensure that the code is legal (i.e. to check that lower bounds are smaller than upper bounds, for example)

Author(s)

Nicholas J Clark

Examples


# Simulate three integer-valued time series
library(mvgam)
dat <- sim_mvgam(trend_rel = 0.5)

# Get a model file that uses default mvgam priors for inspection (not always necessary,
# but this can be useful for testing whether your updated priors are written correctly)
mod_default <- mvgam(y ~ s(series, bs = 're') +
              s(season, bs = 'cc') - 1,
              family = nb(),
              data = dat$data_train,
              trend_model = AR(p = 2),
              run_model = FALSE)

# Inspect the model file with default mvgam priors
code(mod_default)

# Look at which priors can be updated in mvgam
test_priors <- get_mvgam_priors(y ~ s(series, bs = 're') +
                              s(season, bs = 'cc') - 1,
                              family = nb(),
                              data = dat$data_train,
                              trend_model = AR(p = 2))
test_priors

# Make a few changes; first, change the population mean for the series-level
# random intercepts
test_priors$prior[2] <- 'mu_raw ~ normal(0.2, 0.5);'

# Now use stronger regularisation for the series-level AR2 coefficients
test_priors$prior[5] <- 'ar2 ~ normal(0, 0.25);'

# Check that the changes are made to the model file without any warnings by
# setting 'run_model = FALSE'
mod <- mvgam(y ~ s(series, bs = 're') +
            s(season, bs = 'cc') - 1,
            family = nb(),
            data = dat$data_train,
            trend_model = AR(p = 2),
            priors = test_priors,
            run_model = FALSE)
code(mod)

# No warnings, the model is ready for fitting now in the usual way with the addition
# of the 'priors' argument

# The same can be done using 'brms' functions; here we will also change the ar1 prior
# and put some bounds on the ar coefficients to enforce stationarity; we set the
# prior using the 'class' argument in all brms prior functions
brmsprior <- c(prior(normal(0.2, 0.5), class = mu_raw),
              prior(normal(0, 0.25), class = ar1, lb = -1, ub = 1),
              prior(normal(0, 0.25), class = ar2, lb = -1, ub = 1))
brmsprior

mod <- mvgam(y ~ s(series, bs = 're') +
            s(season, bs = 'cc') - 1,
          family = nb(),
          data = dat$data_train,
          trend_model = AR(p = 2),
          priors = brmsprior,
          run_model = FALSE)
code(mod)

# Look at what is returned when an incorrect spelling is used
test_priors$prior[5] <- 'ar2_bananas ~ normal(0, 0.25);'
mod <- mvgam(y ~ s(series, bs = 're') +
            s(season, bs = 'cc') - 1,
            family = nb(),
            data = dat$data_train,
            trend_model = AR(p = 2),
            priors = test_priors,
            run_model = FALSE)
code(mod)

# Example of changing parametric (fixed effect) priors
simdat <- sim_mvgam()

# Add a fake covariate
simdat$data_train$cov <- rnorm(NROW(simdat$data_train))

priors <- get_mvgam_priors(y ~ cov + s(season),
                          data = simdat$data_train,
                          family = poisson(),
                          trend_model = AR())

# Change priors for the intercept and fake covariate effects
priors$prior[1] <- '(Intercept) ~ normal(0, 1);'
priors$prior[2] <- 'cov ~ normal(0, 0.1);'

mod2 <- mvgam(y ~ cov + s(season),
             data = simdat$data_train,
             trend_model = AR(),
             family = poisson(),
             priors = priors,
             run_model = FALSE)
code(mod2)

# Likewise using 'brms' utilities (note that you can use
# Intercept rather than `(Intercept)`) to change priors on the intercept
brmsprior <- c(prior(normal(0.2, 0.5), class = cov),
              prior(normal(0, 0.25), class = Intercept))
brmsprior

mod2 <- mvgam(y ~ cov + s(season),
             data = simdat$data_train,
             trend_model = AR(),
             family = poisson(),
             priors = brmsprior,
             run_model = FALSE)
code(mod2)

# The "class = 'b'" shortcut can be used to put the same prior on all
# 'fixed' effect coefficients (apart from any intercepts)
set.seed(0)
dat <- mgcv::gamSim(1, n = 200, scale = 2)
dat$time <- 1:NROW(dat)
mod <- mvgam(y ~ x0 + x1 + s(x2) + s(x3),
            priors = prior(normal(0, 0.75), class = 'b'),
            data = dat,
            family = gaussian(),
            run_model = FALSE)
code(mod)

[Package mvgam version 1.1.2 Index]