GibbsBvsF {BayesVarSel} | R Documentation |
Bayesian Variable Selection with Factors for linear regression models using Gibbs sampling.
Description
Numerical and factor variable selection from a Bayesian perspective. The posterior distribution is approximated with Gibbs sampling
Usage
GibbsBvsF(
formula,
data,
null.model = paste(as.formula(formula)[[2]], " ~ 1", sep = ""),
prior.betas = "Robust",
prior.models = "SBSB",
n.iter = 10000,
init.model = "Full",
n.burnin = 500,
n.thin = 1,
time.test = TRUE,
seed = runif(1, 0, 16091956)
)
Arguments
formula |
Formula defining the most complex linear model in the analysis. See details. |
data |
data frame containing the data. |
null.model |
A formula defining which is the simplest (null) model. It should be nested in the full model. It is compulsory that the null model contains the intercept and by default, the null model is defined to be the one with just the intercept |
prior.betas |
Prior distribution for regression parameters within each
model (to be literally specified). Possible choices include "Robust", "Robust.G", "Liangetal", "gZellner",
"ZellnerSiow" (see details in |
prior.models |
Prior distribution over the model space (to be literally specified). Possible choices (see details) are "Const", "SB", "ConstConst", "SBConst" and "SBSB" (the default). |
n.iter |
The total number of iterations performed after the burn in process. |
init.model |
The model at which the simulation process starts. Options
include "Null" (the model only with the covariates specified in
|
n.burnin |
Length of burn in, i.e. number of iterations to discard at the beginning. |
n.thin |
Thinning rate. Must be a positive integer. Set 'n.thin' > 1
to save memory and computation time if 'n.iter' is large. Default is 1. This
parameter jointly with |
time.test |
If TRUE and the number of variables is large (>=21) a preliminary test to estimate computational time is performed. |
seed |
A seed to initialize the random number generator |
Details
In practical terms, GibbsBvsF
can be understood as a version of GibbsBvs
in the presence of factors.
The methodology implemented in GibbsBvsF
to handle variable selection problems with factors
has been proposed in Garcia-Donato and Paulo (2018) leading to a method
for which results do not depend on how the factors are
coded (eg. via contrast
).
Internally, a rank defficient representation of factors using dummies is used and the number of competing models considered is
2^(pnum+sum_j L_j),
where pnum is the number of numerical variables and L_j is the number of levels in factor j.
A main difference with Bvs
and GibbsBvs
(due to the presence of factors) concerns the prior
probabilities on the model space:
The options prior.models="SBSB"
, prior.models="ConstConst"
and prior.models="SBConst"
acknowledge the "grouped" nature of the dummy variables representing
factors through the use of two stage
priors described in Garcia-Donato and Paulo (2021). In the first stage probabilities over factors and numerical
variables are specified and (conditional on these) within the second stage
the probablities are apportioned over the different submodels defined
by the dummies. The default (and recommended, for the reasons argued in Garcia-Donato and Paulo,2021)
option is "SBSB" which uses in both stages an assignment
of the type Scott-Berger so inversely proportional to the number of models of the same dimension. The
option "ConstConst" implements a uniform prior for both stages while "SBConst" uses a Scott-Berger prior
in the first stage and it is uniform in the second stage. Within all these priors, the prior inclusion probabilities
of factors and numerical variables are 1/2.
The options prior.models="Const"
and
prior.models="SB"
do not have a staged structure and "Const" apportions the prior probabilities
uniformly over all possible models (2^(pnum+sum_j L_j)) and in "SB" the probability
is inversely proportional to the number of any model of the same dimension. In these cases, prior inclusion probabilities
of factors and numerical variables depend on the number of levels of factors and, in general, are not 1/2.
Value
GibbsBvsF
returns an object of class Bvs
with the
following elements:
time |
The internal time consumed in solving the problem |
lmfull |
The |
lmnull |
The
|
variables |
The name of all the potential explanatory variables (numerical or factors) |
n |
Number of observations |
p |
Number of explanatory variables (both numerical and factors) to select from |
k |
Number of fixed variables |
HPMbin |
The binary expression of the most probable model found. |
inclprob |
A named vector with the estimates of the inclusion probabilities of all the variables. |
jointinclprob |
A |
postprobdim |
Estimates
of posterior probabilities of the number of active variables in the true model (hence ranking from
|
modelslogBF |
A matrix with both the binary representation of the active variables in the MCMC after the burning period and the Bayes factor (log scale) of that model to the null model. |
modelswllogBF |
A matrix with both the binary representation of the active variables (at the level of the levels in the factors) in the MCMC after the burning period and the Bayes factor (log scale) of that model to the null model. |
call |
The |
C |
An estimation of the normalizing constant (C=sum Bi Pr(Mi), for Mi in the model space) using the method in George and McCulloch (1997). |
positions |
A binary matrix with |
positionsx |
A |
prior.betas |
prior.betas |
prior.models |
prior.models |
method |
|
Author(s)
Gonzalo Garcia-Donato and Anabel Forte
References
Garcia-Donato, G. and Martinez-Beneito, M.A. (2013)<DOI:10.1080/01621459.2012.742443> On sampling strategies in Bayesian variable selection problems with large model spaces. Journal of the American Statistical Association, 108: 340-352.
Garcia-Donato, G. and Paulo, R. (2021) Variable selection in the presence of factors: a model selection perspective. Journal of American Statistical Association, Ahead-of- print(1-11).
George E. and McCulloch R. (1997) Approaches for Bayesian variable selection. Statistica Sinica, 7, 339:372.
See Also
plot.Bvs
for several plots of the result.
Under construction: BMAcoeff
for obtaining model averaged simulations
of regression coefficients and predict.Bvs
for
predictions.
See GibbsBvs
and Bvs
when no factors are involved.
Examples
## Not run:
data(diabetes, package="faraway")
#remove NA's and the column with the id of samples:
diabetes2<- na.omit(diabetes)[,-1]
#For reproducibility:
set.seed(16091956)
#Now run the main instruction
diabetesVS<- GibbsBvsF(formula= glyhb ~ ., data=diabetes2, n.iter=100000, n.burnin=5000)
summary(diabetesVS)
#A plot of the dimension of the true model,
plot(diabetesVS, option="dimension")
#A joint inclusion plot
plot(diabetesVS, option="joint")
#Now a similar exercise but with fixed variables:
diabetesVS2<- GibbsBvsF(formula= glyhb ~ ., null.model= glyhb ~ chol+stab.glu,
data=diabetes2, n.iter=100000, n.burnin=5000)
#and with fixed factors:
diabetesVS3<- GibbsBvsF(formula= glyhb ~ ., null.model= glyhb ~ chol+stab.glu+location,
data=diabetes2, n.iter=100000, n.burnin=5000)
## End(Not run)