R: Fitting Bayesian Regression Models with Continuous Shrinkage...

bayesreg {bayesreg}

R Documentation

Fitting Bayesian Regression Models with Continuous Shrinkage Priors

Description

Fit a linear or logistic regression model using Bayesian continuous shrinkage prior distributions. Handles ridge, lasso, horseshoe and horseshoe+ regression with logistic, Gaussian, Laplace, Student-t, Poisson or geometric distributed targets. See bayesreg-package for more details on the features available in this package.

Usage

bayesreg(
  formula,
  data,
  model = "normal",
  prior = "ridge",
  n.samples = 1000,
  burnin = 1000,
  thin = 5,
  t.dof = 5
)

Arguments

`formula`	An object of class "`formula`": a symbolic description of the model to be fitted using the standard R formula notation.
`data`	A data frame containing the variables in the model.
`model`	The distribution of the target (y) variable. Continuous or numeric variables can be distributed as per a Gaussian distribution (`model="gaussian"` or `model="normal"`), Laplace distribution (`model = "laplace"` or `model = "l1"`) or Student-t distribution (`"model" = "studentt"` or `"model" = "t"`). Integer or count data can be distributed as per a Poisson distribution (`model="poisson"`) or geometric distribution (`model="geometric"`). For binary targets (factors with two levels) either `model="logistic"` or `"model"="binomial"` should be used.
`prior`	Which continuous shrinkage prior distribution over the regression coefficients to use. Options include ridge regression (`prior="rr"` or `prior="ridge"`), lasso regression (`prior="lasso"`), horseshoe regression (`prior="hs"` or `prior="horseshoe"`) and horseshoe+ regression (`prior="hs+"` or `prior="horseshoe+"`)
`n.samples`	Number of posterior samples to generate.
`burnin`	Number of burn-in samples.
`thin`	Desired level of thinning.
`t.dof`	Degrees of freedom for the Student-t distribution.

Value

An object with S3 class "bayesreg" containing the results of the sampling process, plus some additional information.

`beta`	Posterior samples the regression model coefficients.
`beta0`	Posterior samples of the intercept parameter.
`sigma2`	Posterior samples of the square of the scale parameter; for Gaussian distributed targets this is equal to the variance. For binary targets this is empty.
`mu.beta`	The mean of the posterior samples for the regression coefficients.
`mu.beta0`	The mean of the posterior samples for the intercept parameter.
`mu.sigma2`	The mean of the posterior samples for squared scale parameter.
`tau2`	Posterior samples of the global shrinkage parameter.
`t.stat`	Posterior t-statistics for each regression coefficient.
`var.ranks`	Ranking of the covariates by their importance, with "1" denoting the most important covariate.
`log.l`	The log-likelihood at the posterior means of the model parameters
`waic`	The Widely Applicable Information Criterion (WAIC) score for the model
`waic.dof`	The effective degrees-of-freedom of the model, as estimated by the WAIC.

The returned object also stores the parameters/options used to run bayesreg:

`formula`	The object of type "`formula`" describing the fitted model.
`model`	The distribution of the target (y) variable.
`prior`	The shrinkage prior used to fit the model.
`n.samples`	The number of samples generated from the posterior distribution.
`burnin`	The number of burnin samples that were generated.
`thin`	The level of thinning.
`n`	The sample size of the data used to fit the model.
`p`	The number of covariates in the fitted model.

Details

Draws a series of samples from the posterior distribution of a linear (Gaussian, Laplace or Student-t) or generalized linear (logistic binary, Poisson, geometric) regression model with specified continuous shrinkage prior distribution (ridge regression, lasso, horseshoe and horseshoe+) using Gibbs sampling. The intercept parameter is always included, and is never penalised.

While only n.samples are returned, the total number of samples generated is equal to burnin+n.samples*thin. To generate the samples of the regression coefficients, the code will use either Rue's algorithm (when the number of samples is twice the number of covariates) or the algorithm of Bhattacharya et al. as appropriate. Factor variables are automatically grouped together and additional shrinkage is applied to the set of indicator variables to which they expand.

Note

To cite this toolbox please reference:

Makalic, E. & Schmidt, D. F. High-Dimensional Bayesian Regularised Regression with the BayesReg Package arXiv:1611.06649 [stat.CO], 2016 https://arxiv.org/pdf/1611.06649.pdf

A MATLAB implementation of the bayesreg function is also available from:

https://au.mathworks.com/matlabcentral/fileexchange/60823-flexible-bayesian-penalized-regression-modelling

References

Makalic, E. & Schmidt, D. F. High-Dimensional Bayesian Regularised Regression with the BayesReg Package arXiv:1611.06649 [stat.CO], 2016 https://arxiv.org/pdf/1611.06649.pdf

Park, T. & Casella, G. The Bayesian Lasso Journal of the American Statistical Association, Vol. 103, pp. 681-686, 2008

Carvalho, C. M.; Polson, N. G. & Scott, J. G. The horseshoe estimator for sparse signals Biometrika, Vol. 97, 465-480, 2010

Makalic, E. & Schmidt, D. F. A Simple Sampler for the Horseshoe Estimator IEEE Signal Processing Letters, Vol. 23, pp. 179-182, 2016 https://arxiv.org/pdf/1508.03884v4.pdf

Bhadra, A.; Datta, J.; Polson, N. G. & Willard, B. The Horseshoe+ Estimator of Ultra-Sparse Signals Bayesian Analysis, 2016

Polson, N. G.; Scott, J. G. & Windle, J. Bayesian inference for logistic models using Polya-Gamma latent variables Journal of the American Statistical Association, Vol. 108, 1339-1349, 2013

Rue, H. Fast sampling of Gaussian Markov random fields Journal of the Royal Statistical Society (Series B), Vol. 63, 325-338, 2001

Bhattacharya, A.; Chakraborty, A. & Mallick, B. K. Fast sampling with Gaussian scale-mixture priors in high-dimensional regression arXiv:1506.04778, 2016

Schmidt, D.F. & Makalic, E. Bayesian Generalized Horseshoe Estimation of Generalized Linear Models ECML PKDD 2019: Machine Learning and Knowledge Discovery in Databases. pp 598-613, 2019

Stan Development Team, Stan Reference Manual (Version 2.26), Section 15.4, "Effective Sample Size", https://mc-stan.org/docs/2_18/reference-manual/effective-sample-size-section.html

Examples

# -----------------------------------------------------------------
# Example 1: Gaussian regression
X = matrix(rnorm(100*20),100,20)
b = matrix(0,20,1)
b[1:5] = c(5,4,3,2,1)
y = X %*% b + rnorm(100, 0, 1)

df <- data.frame(X,y)
rv.lm <- lm(y~.,df)                        # Regular least-squares
summary(rv.lm)

rv.hs <- bayesreg(y~.,df,prior="hs")       # Horseshoe regression
rv.hs.s <- summary(rv.hs)

# Expected squared prediction error for least-squares
coef_ls = coef(rv.lm)
as.numeric(sum( (as.matrix(coef_ls[-1]) - b)^2 ) + coef_ls[1]^2)

# Expected squared prediction error for horseshoe
as.numeric(sum( (rv.hs$mu.beta - b)^2 ) + rv.hs$mu.beta0^2)


# -----------------------------------------------------------------
# Example 2: Gaussian v Student-t robust regression
X = 1:10;
y = c(-0.6867, 1.7258, 1.9117, 6.1832, 5.3636, 7.1139, 9.5668, 10.0593, 11.4044, 6.1677);
df = data.frame(X,y)

# Gaussian ridge
rv.G <- bayesreg(y~., df, model = "gaussian", prior = "ridge", n.samples = 1e3)

# Student-t ridge
rv.t <- bayesreg(y~., df, model = "t", prior = "ridge", t.dof = 5, n.samples = 1e3)

# Plot the different estimates with credible intervals
plot(df$X, df$y, xlab="x", ylab="y")

yhat_G <- predict(rv.G, df, bayes.avg=TRUE)
lines(df$X, yhat_G[,1], col="blue", lwd=2.5)
lines(df$X, yhat_G[,3], col="blue", lwd=1, lty="dashed")
lines(df$X, yhat_G[,4], col="blue", lwd=1, lty="dashed")

yhat_t <- predict(rv.t, df, bayes.avg=TRUE)
lines(df$X, yhat_t[,1], col="darkred", lwd=2.5)
lines(df$X, yhat_t[,3], col="darkred", lwd=1, lty="dashed")
lines(df$X, yhat_t[,4], col="darkred", lwd=1, lty="dashed")

legend(1,11,c("Gaussian","Student-t (dof=5)"),lty=c(1,1),col=c("blue","darkred"),
       lwd=c(2.5,2.5), cex=0.7)

## Not run: 
# -----------------------------------------------------------------
# Example 3: Poisson/geometric regression example

X  = matrix(rnorm(100*20),100,5)
b  = c(0.5,-1,0,0,1)
nu = X%*%b + 1
y  = rpois(lambda=exp(nu),n=length(nu))

df <- data.frame(X,y)

# Fit a Poisson regression
rv.pois=bayesreg(y~.,data=df,model="poisson",prior="hs", burnin=1e4, n.samples=1e4)
summary(rv.pois)

# Fit a geometric regression
rv.geo=bayesreg(y~.,data=df,model="geometric",prior="hs", burnin=1e4, n.samples=1e4)
summary(rv.geo)

# Compare the two models in terms of their WAIC scores
cat(sprintf("Poisson regression WAIC=%g vs geometric regression WAIC=%g", 
            rv.pois$waic, rv.geo$waic))
# Poisson is clearly preferred to geometric, which is good as data is generated from a Poisson!
 
 
# -----------------------------------------------------------------
# Example 4: Logistic regression on spambase
data(spambase)
  
# bayesreg expects binary targets to be factors
spambase$is.spam <- factor(spambase$is.spam)

# First take a subset of the data (1/10th) for training, reserve the rest for testing
spambase.tr  = spambase[seq(1,nrow(spambase),10),]
spambase.tst = spambase[-seq(1,nrow(spambase),10),]
  
# Fit a model using logistic horseshoe for 2,000 samples
rv <- bayesreg(is.spam ~ ., spambase.tr, model = "logistic", prior = "horseshoe", n.samples = 2e3)
  
# Summarise, sorting variables by their ranking importance
rv.s <- summary(rv,sort.rank=TRUE)
  
# Make predictions about testing data -- get class predictions and class probabilities
y_pred <- predict(rv, spambase.tst, type='class')
  
# Check how well did our predictions did by generating confusion matrix
table(y_pred, spambase.tst$is.spam)
  
# Calculate logarithmic loss on test data
y_prob <- predict(rv, spambase.tst, type='prob')
cat('Neg Log-Like for no Bayes average, posterior mean estimates: ', sum(-log(y_prob[,1])), '\n')
y_prob <- predict(rv, spambase.tst, type='prob', sum.stat="median")
cat('Neg Log-Like for no Bayes average, posterior median estimates: ', sum(-log(y_prob[,1])), '\n')
y_prob <- predict(rv, spambase.tst, type='prob', bayes.avg=TRUE)
cat('Neg Log-Like for Bayes average: ', sum(-log(y_prob[,1])), '\n')

## End(Not run)

[Package bayesreg version 1.2 Index]