| bamlss {bamlss} | R Documentation |
Fit Bayesian Additive Models for Location Scale and Shape (and Beyond)
Description
This is the main model fitting function of the package. Function bamlss()
is a wrapper function that parses the data and the model formula, or
extended bamlss.formula, as well as the bamlss.family
into a bamlss.frame. The bamlss.frame then holds all model
matrices and information that is needed for setting up estimation engines.
The model matrices are based on mgcv infrastructures, i.e.,
smooth terms are constructed using smooth.construct and
smoothCon. Therefore, all mgcv model term constructors like
s, te, t2 and ti
can be used. Identifiability conditions are imposed using function gam.side.
After the bamlss.frame is set up function bamlss() applies optimizer
and/or sampling functions. These functions can also be provided by the user. See the details
below on how to create new engines to be used with function bamlss().
Finally, the estimated parameters and/or samples are used to create model output results like summary statistics or effect plots. The computation of results may also be controlled by the user.
Usage
bamlss(formula, family = "gaussian", data = NULL,
start = NULL, knots = NULL, weights = NULL,
subset = NULL, offset = NULL, na.action = na.omit,
contrasts = NULL, reference = NULL, transform = NULL,
optimizer = NULL, sampler = NULL, samplestats = NULL,
results = NULL, cores = NULL, sleep = NULL,
combine = TRUE, model = TRUE, x = TRUE,
light = FALSE, ...)
Arguments
formula |
A formula or extended formula, i.e., the |
family |
A |
data |
A |
start |
A named numeric vector containing starting values to be send to the |
knots |
An optional list containing user specified knots, see the documentation of
function |
weights |
Prior weights on the data. |
subset |
An optional vector specifying a subset of observations to be used in the fitting process. |
offset |
Can be used to supply model offsets for use in fitting. |
na.action |
A function which indicates what should happen when the data
contain |
contrasts |
An optional list. See the |
reference |
A |
transform |
A transformer function that is applied on the |
optimizer |
An optimizer function that returns, e.g., posterior mode estimates
of the parameters as a named numeric vector. The default optimizer function is
|
sampler |
A sampler function that returns a matrix of samples, the columns represent the
parameters, the rows the iterations. The returned matrix must be coerced to an object of
class |
samplestats |
A function computing statistics from samples, per default function
|
results |
A function computing results from the parameters and/or samples, e.g., for
creating effect plots, see function |
cores |
An integer specifying the number of cores that should be used for the sampler
function. This is based on function |
sleep |
Time the system should sleep before the next core is started. |
combine |
If samples are computed on multiple cores, should the samples be combined into
one |
model |
If set to |
x |
If set to |
light |
Should the returned object be lighter, i.e., if |
... |
Arguments passed to the |
Details
The main idea of this function is to provide infrastructures that make it relatively easy to create estimation engines for new problems, or write interfaces to existing software packages.
The steps that are performed within the function are:
First, the function parses the
data, theformulaor the extendedbamlss.formulaas well as thebamlss.familyinto a model frame like object, thebamlss.frame. This object holds all necessary model matrices and information that is needed for subsequent model fitting engines. Per default, all packagemgcvsmooth term constructor functions likes,te,t2andtican be used (see also functionsmooth.construct), however, even special user defined constructors can be included, see the manual ofbamlss.frame.In a second step, the
bamlss.framecan be transformed, e.g., if a mixed model representation of smooth terms is needed, see functionrandomize.Then an optimizer function is started, e.g., a function that finds posterior mode estimates of the parameters. A convention for model fitting engines is that such functions should have the following arguments:
optimizer(x, y, family, start, weights, offset, ...)Internally, function
bamlss()will send thexobject that holds all model matrices, the responseyobject, thefamilyobject,starting values for the parameters, possibleweightsandoffsets of the createdbamlss.frameto the optimizer function (see the manual ofbamlss.framefor more details on thex,yand other objects). The job of the optimizer is to return a named numeric vector of optimum parameters. The names of the parameters should be such that they can be uniquely mapped to the corresponding model matrices inx. See functionparametersfor more details on parameter names. The default optimizer function isopt_bfit. The optimizer can return more information than only the optimum parameters. It is possible to return a list, the convention here is that an element named"parameters"then holds the named vector of estimated parameters. Possible other return values could be fitted values, the Hessian matrix, information criteria or information about convergence of the algorithm, etc. Note that the parameters are simply added to thebamlss.framein an (list) entry namedparameters.After the optimization step, a
samplerfunction is started. The arguments of such sampler functions are the same as for theoptimizerfunctionssampler(x, y, family, start, weights, offset, ...)Sampler functions must return a matrix of samples, each row represents one iteration and the matrix can be coerced to
mcmcobjects. The function may return a list of samples, e.g., if multiple chains are returned each list entry then holds one sample matrix of one chain. The column names of the sample matrix should be the same as the names of estimated parameters. For a possible naming convention see functionparameters, which ensures unique mapping of samples with the model matrices in thexobject of thebamlss.frame. The samples are added to thebamlss.framein an (list) entry namedsamples.Next, the
samplestatsfunction is applied. This function can compute any quantity from the samples and thexobject, the arguments of such functions aresamplestats(samples, x, y, family, ...)where argument
samplesare the samples returned from thesamplerfunction, andx,yandfamilyare the same objects as passed to the optimizer and or sampler functions. For example, the default function inbamlss()for this task is also calledsamplestatsand returns the mean of the log-likelihood and the log-posterior computed of all samples, as well as the DIC.The last step is to compute more complex information about the model using the
resultsfunction. The arguments of suchresultsfunctions areresults(bamlss.frame, ...)here, the full
bamlss.frameincluding possibleparametersandsamplesis passed to the function withinbamlss(). The default function for this task isresults.bamlss.defaultwhich returns an object of class"bamlss.results"for which generic plotting functions are and asummaryfunction is provided. Hence, the user can control the output of the model, the plotting and summary statistics, too.
Note that function transform(), optimizer(), sampler(), samplestats()
and results() can be provided from the bamlss.family object, e.g.,
if a bamlss.family object has an element named "optimizer", which
represents a valid optimizer function such as opt_bfit, exactly this optimizer
function will be used as a default when using the family.
Value
An object of class "bamlss". The object is in principle only a slight extension
of a bamlss.frame, i.e., if an optimizer is applied it will hold the
estimated parameters in an additional element named "parameters". If a sampler function
is applied it will additionally hold the samples in an element named "samples".
The same mechanism is used for results function.
If the optimizer function computes additional output next to the parameters, this will
be saved in an element named "model.stats". If a samplestats function is applied,
the output will also be saved in the "model.stats" element.
Additionally, all functions that are called are saved as attribute "functions" in the
returned object.
Author(s)
Nikolaus Umlauf, Nadja Klein, Achim Zeileis.
References
Umlauf N, Klein N, Zeileis A (2019). BAMLSS: Bayesian Additive Models for Location, Scale and Shape (and Beyond). Journal of Computational and Graphical Statistics, 27(3), 612–627. doi:10.1080/10618600.2017.1407325
Umlauf N, Klein N, Simon T, Zeileis A (2021). bamlss: A Lego Toolbox for Flexible Bayesian Regression (and Beyond). Journal of Statistical Software, 100(4), 1–53. doi:10.18637/jss.v100.i04
See Also
bamlss.frame, family.bamlss, bamlss.formula,
randomize, bamlss.engine.setup,
opt_bfit, sam_GMCMC, continue,
coef.bamlss, parameters, predict.bamlss,
plot.bamlss
Examples
## Not run: ## Simulated data example.
d <- GAMart()
f <- num ~ s(x1) + s(x2) + s(x3) + te(lon, lat)
b <- bamlss(f, data = d)
summary(b)
plot(b)
plot(b, which = 3:4)
plot(b, which = "samples")
## Use of optimizer and sampler functions:
## * first run optimizer,
b1 <- bamlss(f, data = d, optimizer = opt_bfit, sampler = FALSE)
print(b1)
summary(b1)
## * afterwards, start sampler with staring values,
b2 <- bamlss(f, data = d, start = coef(b1), optimizer = FALSE, sampler = sam_GMCMC)
print(b2)
summary(b2)
## Continue sampling.
b3 <- continue(b2, n.iter = 12000, burnin = 0, thin = 10)
plot(b3, which = "samples")
plot(b3, which = "max-acf")
plot(b3, which = "max-acf", burnin = 500, thin = 4)
## End(Not run)