gamselBayes {gamselBayes} | R Documentation |
Bayesian generalized additive model selection including a fast variational option
Description
Selection of predictors and the nature of their impact on the mean response (linear versus non-linear) is a fundamental problem in regression analysis. This function uses the generalized additive models framework for estimating predictors effects. An approximate Bayesian inference approach and has two options for achieving this: (1) Markov chain Monte Carlo and (2) mean field variational Bayes.
Usage
gamselBayes(y,Xlinear = NULL,Xgeneral = NULL,method = "MCMC",lowerMakesSparser = NULL,
family = "gaussian",verbose = TRUE,control = gamselBayes.control())
Arguments
y |
Vector containing the response data. If 'family = "gaussian"' then the response data are modelled as being continuous with a Gaussian distribution. If 'family = "binomial"' then the response data must be binary with 0/1 coding. |
Xlinear |
Data frame with number of rows equal to the length of |
Xgeneral |
A data frame with number of rows equal to the length of |
method |
Character string for specifying the method to be used: |
lowerMakesSparser |
A threshold parameter between 0 and 1, which is such that lower values lead to sparser fits. |
family |
Character string for specifying the response family: |
verbose |
Boolean variable for specifying whether or not progress messages are printed to the console. The default is TRUE. |
control |
Function for controlling the spline bases, Markov chain Monte Carlo sampling, mean field variational Bayes and other specifications. |
Details
Generalized additive model selection via approximate Bayesian inference is provided. Bayesian mixed model-based penalized splines with spike-and-slab-type coefficient prior distributions are used to facilitate fitting and selection. The approximate Bayesian inference engine options are: (1) Markov chain Monte Carlo and (2) mean field variational Bayes. Markov chain Monte Carlo has better Bayesian inferential accuracy, but requires a longer run-time. Mean field variational Bayes is faster, but less accurate. The methodology is described in He and Wand (2021) <arXiv:2201.00412>.
Value
An object of class gamselBayes
, which is a list with the following components:
method |
the value of |
family |
the value of |
Xlinear |
the inputted design matrix containing predictors that can only have linear effects. |
Xgeneral |
the inputted design matrix containing predictors that are potentially have non-linear effects. |
rangex |
the value of the control parameter |
intKnots |
the value of the control parameter |
truncateBasis |
the value of the control parameter |
numBasis |
the value of the control parameter |
MCMC |
a list such that each component is the retained Markov chain Monte Carlo (MCMC)sample for a model parameter. The components are: |
MFVB |
a list such that each component is the mean field variational Bayes approximate posterior density function, or q-density, parameters. The components are: |
effectTypeHat |
an array of character strings, with entry either "zero", "linear" or "nonlinear", signifying the estimated effect type for each candidate predictor. |
meanXlinear |
an array containing the sample means of each column of |
sdXlinear |
an array containing the sample standard deviations of each column of |
meanXgeneral |
an array containing the sample means of each column of |
sdXgeneral |
an array containing the sample standard deviations of each column of |
Author(s)
Virginia X. He virginia.x.he@student.uts.edu.au and Matt P. Wand matt.wand@uts.edu.au
References
Chouldechova, A. and Hastie, T. (2015). Generalized additive model selection. <arXiv:1506.03850v2>.
He, V.X. and Wand, M.P. (2021). Bayesian generalized additive model selection including a fast variational option. <arXiv:2021.PLACE-HOLDER>.
Examples
library(gamselBayes)
# Generate some simple regression-type data:
set.seed(1) ; n <- 1000 ; x1 <- rbinom(n,1,0.5) ;
x2 <- runif(n) ; x3 <- runif(n) ; x4 <- runif(n)
y <- x1 + sin(2*pi*x2) - x3 + rnorm(n)
Xlinear <- data.frame(x1) ; Xgeneral <- data.frame(x2,x3,x4)
# Obtain a gamselBayes() fit for the data, using Markov chain Monte Carlo:
fitMCMC <- gamselBayes(y,Xlinear,Xgeneral)
summary(fitMCMC) ; plot(fitMCMC) ; checkChains(fitMCMC)
# Obtain a gamselBayes() fit for the data, using mean field variational Bayes:
fitMFVB <- gamselBayes(y,Xlinear,Xgeneral,method = "MFVB")
summary(fitMFVB) ; plot(fitMFVB)
if (require("Ecdat"))
{
# Obtain a gamselBayes() fit for data on schools in California, U.S.A.:
Caschool$log.avginc <- log(Caschool$avginc)
mathScore <- Caschool$mathscr
Xgeneral <- Caschool[,c("mealpct","elpct","calwpct","compstu","log.avginc")]
# Obtain a gamselBayes() fit for the data, using Markov chain Monte Carlo:
fitMCMC <- gamselBayes(y = mathScore,Xgeneral = Xgeneral)
summary(fitMCMC) ; plot(fitMCMC) ; checkChains(fitMCMC)
# Obtain a gamselBayes() fit for the data, using mean field variational Bayes:
fitMFVB <- gamselBayes(y = mathScore,Xgeneral = Xgeneral,method = "MFVB")
summary(fitMFVB) ; plot(fitMFVB)
}