R: Bayesian generalized additive modeling with...

gamlps {blapsr}

R Documentation

Bayesian generalized additive modeling with Laplace-P-splines.

Description

Fits a generalized additive model (GAM) to data using an approximate Bayesian inference technique based on penalized regression splines and Laplace approximations. Smooth additive terms are specified as a linear combination of a large number of cubic B-splines. To counterbalance the roughness of the fit, a discrete penalty on neighboring spline coefficients is imposed in the spirit of Eilers and Marx (1996). The effective degrees of freedom of the smooth terms are also estimated.

The optimal amount of smoothing is determined by a grid-based exploration of the posterior penalty space when the number of smooth terms is small to moderate. When the dimension of the penalty space is large, the optimal smoothing parameter is chosen to be the value that maximizes the (log-)posterior of the penalty vector. Approximate Bayesian credible intervals for latent model variables and functions of latent model variables are available.

Usage

gamlps(formula, data, K = 30, family = c("gaussian", "poisson", "bernoulli", "binomial"),
       gauss.scale, nbinom, penorder = 2, cred.int = 0.95)

Arguments

`formula`	A formula object where the ~ operator separates the response from the covariates of the linear part `z1,z2,..` and the smooth terms. A smooth term is specified by using the notation `sm(.)`. For instance, the formula `y ~ z1+sm(x1)+sm(x2)` specifies a generalized additive model of the form g(mu) = b0+b1z1+f1(x1)+f2(x2), where b0, b1 are the regression coefficients of the linear part and f1(.) and f2(.) are smooth functions of the continuous covariates x1 and x2 respectively. The function g(.) is the canonical link function.
`data`	Optional. A data frame to match the variables names provided in formula.
`K`	A positive integer specifying the number of cubic B-spline functions in the basis used to model the smooth terms. Default is `K = 30` and allowed values are `15 <= K <= 60`. The same basis dimension is used for each smooth term in the model. Also, the computational cost to fit the model increases with `K`.
`family`	The error distribution of the model. It is a character string that must partially match either `"gaussian"` for Normal data with an identity link, `"poisson"` for Poisson data with a log link, `"bernoulli"` or `"binomial"` for Bernoulli or Binomial data with a logit link.
`gauss.scale`	The scale parameter to be specified when `family = "gaussian"`. It corresponds to the variance of the response.
`nbinom`	The number of experiments in the Binomial family.
`penorder`	The penalty order used on finite differences of the coefficients of contiguous B-splines. Can be either 2 for a second-order penalty (the default) or 3 for a third-order penalty.
`cred.int`	The level of the pointwise credible interval to be computed for the coefficients in the linear part of the model.

Details

The B-spline basis used to approximate a smooth additive component is computed with the function cubicbs. The lower (upper) bound of the B-spline basis is taken to be the minimum (maximum) value of the covariate associated to the smooth. For identifiability purposes, the B-spline matrices (computed over the observed covariates) are centered. The centering consists is subtracting from each column of the B-spline matrix, the corresponding column average of another B-spline matrix computed on a fine grid of equidistant values in the domain of the smooth term.

A hierarchical Gamma prior is imposed on the roughness penalty vector. A Newton-Raphson algorithm is used to compute the posterior mode of the (log-)posterior penalty vector. The latter algorithm uses analytically derived versions of the gradient and Hessian. When the number of smooth terms in the model is smaller or equal to 4, a grid-based strategy is used for posterior exploration of the penalty space. Above that threshold, the optimal amount of smoothness is determined by the posterior maximum value of the penalty vector. This strategy allows to keep the computational burden to fit the model relatively low and conserve good statistical performance.

Value

An object of class gamlps containing several components from the fit. Details can be found in gamlps.object. Details on the output printed by gamlps can be found in print.gamlps. Fitted smooth terms can be visualized with the plot.gamlps routine.

Author(s)

Oswaldo Gressani oswaldo_gressani@hotmail.fr.

References

Hastie, T. J. and Tibshirani., RJ (1990). Generalized additive models. Monographs on statistics and applied probability, 43, 335.

Eilers, P.H.C. and Marx, B.D. (1996). Flexible smoothing with B-splines and penalties. Statistical Science, 11(2): 89-121.

Gressani, O. and Lambert, P. (2018). Fast Bayesian inference using Laplace approximations in a flexible promotion time cure model based on P-splines. Computational Statistical & Data Analysis 124: 151-167.

Examples


set.seed(14)
sim <- simgamdata(n = 300, setting = 2, dist = "binomial", scale = 0.25)
dat <- sim$data
fit <- gamlps(y ~ z1 + z2 + sm(x1) + sm(x2), K = 15, data = dat,
              penorder = 2, family = "binomial", nbinom = 15)
fit

# Check fit and compare with target (in red)
opar <- par(no.readonly = TRUE)
par(mfrow = c(1, 2))
domx <- seq(-1, 1 ,length = 200)
plot(fit, smoo.index = 1, cred.int = 0.95, ylim = c(-2, 2))
lines(domx, sim$f[[1]](domx), type= "l", lty = 2, lwd = 2, col = "red")
plot(fit, smoo.index = 2, cred.int = 0.95, ylim = c(-3, 3))
lines(domx, sim$f[[2]](domx), type= "l", lty = 2, lwd = 2, col = "red")
par(opar)

[Package blapsr version 0.6.1 Index]