bamlss {bamlss}R Documentation

Fit Bayesian Additive Models for Location Scale and Shape (and Beyond)

Description

This is the main model fitting function of the package. Function bamlss() is a wrapper function that parses the data and the model formula, or extended bamlss.formula, as well as the bamlss.family into a bamlss.frame. The bamlss.frame then holds all model matrices and information that is needed for setting up estimation engines. The model matrices are based on mgcv infrastructures, i.e., smooth terms are constructed using smooth.construct and smoothCon. Therefore, all mgcv model term constructors like s, te, t2 and ti can be used. Identifiability conditions are imposed using function gam.side.

After the bamlss.frame is set up function bamlss() applies optimizer and/or sampling functions. These functions can also be provided by the user. See the details below on how to create new engines to be used with function bamlss().

Finally, the estimated parameters and/or samples are used to create model output results like summary statistics or effect plots. The computation of results may also be controlled by the user.

Usage

bamlss(formula, family = "gaussian", data = NULL,
  start = NULL, knots = NULL, weights = NULL,
  subset = NULL, offset = NULL, na.action = na.omit,
  contrasts = NULL, reference = NULL, transform = NULL,
  optimizer = NULL, sampler = NULL, samplestats = NULL,
  results = NULL, cores = NULL, sleep = NULL,
  combine = TRUE, model = TRUE, x = TRUE,
  light = FALSE, ...)

Arguments

formula

A formula or extended formula, i.e., the formula can be a list of formulas where each list entry specifies the details of one parameter of the modeled response distribution, see bamlss.formula. For incorporating smooth terms, all model term constructors implemented in mgcv such as s, te and ti can be used, amongst others.

family

A bamlss.family object, specifying the details of the modeled distribution such as the parameter names, the density function, link functions, etc. Can be a character without the "_bamlss" extension of the bamlss.family name.

data

A data.frame or list containing the model response variable(s) and covariates specified in the formula. By default the variables are taken from environment(formula): typically the environment from which bamlss is called.

start

A named numeric vector containing starting values to be send to the optimizer and/or sampler function. For a possible naming convention for the parameters see function parameters, but this is not restrictive and engine specific.

knots

An optional list containing user specified knots, see the documentation of function gam.

weights

Prior weights on the data.

subset

An optional vector specifying a subset of observations to be used in the fitting process.

offset

Can be used to supply model offsets for use in fitting.

na.action

A function which indicates what should happen when the data contain NA's. The default is set by the na.action setting of options, and is na.omit if set to NULL.

contrasts

An optional list. See the contrasts.arg of model.matrix.default.

reference

A character specifying a reference category, e.g., when fitting a multinomial model.

transform

A transformer function that is applied on the bamlss.frame. See, e.g., function randomize and bamlss.engine.setup.

optimizer

An optimizer function that returns, e.g., posterior mode estimates of the parameters as a named numeric vector. The default optimizer function is opt_bfit. If set to FALSE, no optimizer function will be used.

sampler

A sampler function that returns a matrix of samples, the columns represent the parameters, the rows the iterations. The returned matrix must be coerced to an object of class "mcmc", see as.mcmc. The default sampler function is sam_GMCMC. If set to FALSE, no sampler function will be used.

samplestats

A function computing statistics from samples, per default function samplestats is used. If set to FALSE, no samplestats function will be used. Note that this option is crucial for very large datasets, as computing statistics from samples this way may be very time consuming!

results

A function computing results from the parameters and/or samples, e.g., for creating effect plots, see function link{results.bamlss.default}. If set FALSE no results function will be used.

cores

An integer specifying the number of cores that should be used for the sampler function. This is based on function mclapply of the parallel package.

sleep

Time the system should sleep before the next core is started.

combine

If samples are computed on multiple cores, should the samples be combined into one mcmc matrix?

model

If set to FALSE the model frame used for modeling is not part of the return value.

x

If set to FALSE the model matrices are not part of the return value.

light

Should the returned object be lighter, i.e., if light = TRUE the returned object will not contain the model.frame and design and penalty matrices are deleted.

...

Arguments passed to the transformer, optimizer, sampler, results and samplestats function.

Details

The main idea of this function is to provide infrastructures that make it relatively easy to create estimation engines for new problems, or write interfaces to existing software packages.

The steps that are performed within the function are:

Note that function transform(), optimizer(), sampler(), samplestats() and results() can be provided from the bamlss.family object, e.g., if a bamlss.family object has an element named "optimizer", which represents a valid optimizer function such as opt_bfit, exactly this optimizer function will be used as a default when using the family.

Value

An object of class "bamlss". The object is in principle only a slight extension of a bamlss.frame, i.e., if an optimizer is applied it will hold the estimated parameters in an additional element named "parameters". If a sampler function is applied it will additionally hold the samples in an element named "samples". The same mechanism is used for results function.

If the optimizer function computes additional output next to the parameters, this will be saved in an element named "model.stats". If a samplestats function is applied, the output will also be saved in the "model.stats" element.

Additionally, all functions that are called are saved as attribute "functions" in the returned object.

Author(s)

Nikolaus Umlauf, Nadja Klein, Achim Zeileis.

References

Umlauf N, Klein N, Zeileis A (2019). BAMLSS: Bayesian Additive Models for Location, Scale and Shape (and Beyond). Journal of Computational and Graphical Statistics, 27(3), 612–627. doi:10.1080/10618600.2017.1407325

Umlauf N, Klein N, Simon T, Zeileis A (2021). bamlss: A Lego Toolbox for Flexible Bayesian Regression (and Beyond). Journal of Statistical Software, 100(4), 1–53. doi:10.18637/jss.v100.i04

See Also

bamlss.frame, family.bamlss, bamlss.formula, randomize, bamlss.engine.setup, opt_bfit, sam_GMCMC, continue, coef.bamlss, parameters, predict.bamlss, plot.bamlss

Examples

## Not run: ## Simulated data example.
d <- GAMart()
f <- num ~ s(x1) + s(x2) + s(x3) + te(lon, lat)
b <- bamlss(f, data = d)
summary(b)
plot(b)
plot(b, which = 3:4)
plot(b, which = "samples")

## Use of optimizer and sampler functions:
## * first run optimizer,
b1 <- bamlss(f, data = d, optimizer = opt_bfit, sampler = FALSE)
print(b1)
summary(b1)

## * afterwards, start sampler with staring values,
b2 <- bamlss(f, data = d, start = coef(b1), optimizer = FALSE, sampler = sam_GMCMC)
print(b2)
summary(b2)

## Continue sampling.
b3 <- continue(b2, n.iter = 12000, burnin = 0, thin = 10)
plot(b3, which = "samples")
plot(b3, which = "max-acf")
plot(b3, which = "max-acf", burnin = 500, thin = 4)

## End(Not run)

[Package bamlss version 1.2-4 Index]