bic.glm {BMA}  R Documentation 
Bayesian Model Averaging accounts for the model uncertainty inherent in the variable selection problem by averaging over the best models in the model class according to approximate posterior model probability.
bic.glm(x, ...)
## S3 method for class 'matrix'
bic.glm(x, y, glm.family, wt = rep(1, nrow(x)),
strict = FALSE, prior.param = c(rep(0.5, ncol(x))), OR = 20,
maxCol = 30, OR.fix = 2, nbest = 150, dispersion = NULL,
factor.type = TRUE, factor.prior.adjust = FALSE,
occam.window = TRUE, call = NULL, ...)
## S3 method for class 'data.frame'
bic.glm(x, y, glm.family, wt = rep(1, nrow(x)),
strict = FALSE, prior.param = c(rep(0.5, ncol(x))), OR = 20,
maxCol = 30, OR.fix = 2, nbest = 150, dispersion = NULL,
factor.type = TRUE, factor.prior.adjust = FALSE,
occam.window = TRUE, call = NULL, ...)
## S3 method for class 'formula'
bic.glm(f, data, glm.family, wt = rep(1, nrow(data)),
strict = FALSE, prior.param = c(rep(0.5, ncol(x))), OR = 20,
maxCol = 30, OR.fix = 2, nbest = 150, dispersion = NULL,
factor.type = TRUE, factor.prior.adjust = FALSE,
occam.window = TRUE, na.action = na.omit, ...)
x 
a matrix or data.frame of independent variables. 
y 
a vector of values for the dependent variable. 
f 
a formula 
data 
a data frame containing the variables in the model. 
glm.family 
a description of the error distribution and link function to be used in the model. This can be a character string naming a family function, a family function or the result of a call to a family function. (See 'family' for details of family functions.) 
wt 
an optional vector of weights to be used. 
strict 
a logical indicating whether models with more likely submodels are
eliminated. 
prior.param 
a vector of values specifying the prior weights for each variable. 
OR 
a number specifying the maximum ratio for excluding models in Occam's window 
maxCol 
a number specifying the maximum number of columns in design matrix (including intercept) to be kept. 
OR.fix 
width of the window which keeps models after the leaps approximation
is done.
Because the leaps and bounds gives only an approximation to BIC,
there is a need to increase the window at this first "cut" so as to
assure that no good models are deleted.
The level of this cut is at 
nbest 
a number specifying the number of models of each size returned to

dispersion 
a logical value specifying whether dispersion should be
estimated or not. Default is 
factor.type 
a logical value specifying how variables of class "factor" are
handled.
A factor variable with d levels is turned into (d1) dummy variables using a
treatment contrast.
If 
factor.prior.adjust 
a logical value specifying whether
the prior distribution on dummy variables for factors
should be adjusted when 
occam.window 
a logical value specifying if Occam's window should be used.
If set to 
call 
used internally 
na.action 
a function which indicates what should happen when data contain 
... 
unused 
Bayesian Model Averaging accounts for the model uncertainty inherent in the variable selection problem by averaging over the best models in the model class according to approximate posterior model probability.
bic.glm
returns an object of class bic.glm
The function summary
is used to print a summary of the results.
The function plot
is used to plot posterior distributions for the coefficients.
The function imageplot
generates an image of the models which were averaged over.
An object of class bic.glm
is a list containing at least the following components:
postprob 
the posterior probabilities of the models selected 
deviance 
the estimated model deviances 
label 
labels identifying the models selected 
bic 
values of BIC for the models 
size 
the number of independent variables in each of the models 
which 
a logical matrix with one row per model and one column per variable indicating whether that variable is in the model 
probne0 
the posterior probability that each variable is nonzero (in percent) 
postmean 
the posterior mean of each coefficient (from model averaging) 
postsd 
the posterior standard deviation of each coefficient (from model averaging) 
condpostmean 
the posterior mean of each coefficient conditional on the variable being included in the model 
condpostsd 
the posterior standard deviation of each coefficient conditional on the variable being included in the model 
mle 
matrix with one row per model and one column per variable giving the maximum likelihood estimate of each coefficient for each model 
se 
matrix with one row per model and one column per variable giving the standard error of each coefficient for each model 
reduced 
a logical indicating whether any variables were dropped before model averaging 
dropped 
a vector containing the names of those variables dropped before model averaging 
call 
the matched call that created the bma.lm object 
If more than maxcol
variables are supplied, then bic.glm does stepwise
elimination of variables until maxcol
variables are reached.
bic.glm
handles factor variables according to the factor.type
parameter. If this is true then factor variables are kept in the model or dropped in
entirety. If false, then each dummy variable can be kept or dropped independently.
If bic.glm
is used with a formula that includes interactions between factor
variables, then bic.glm
will create a new factor variable to represent that
interaction, and this factor variable will be kept or dropped in entirety if
factor.type
is true.
This can create interpretation problems if any of the corresponding main effects are
dropped.
Many thanks to Sanford Weisberg for making source code for leaps available.
Chris Volinsky volinsky@research.att.com, Adrian Raftery raftery@stat.washington.edu, and Ian Painter ian.painter@gmail.com
Raftery, Adrian E. (1995). Bayesian model selection in social research (with Discussion). Sociological Methodology 1995 (Peter V. Marsden, ed.), pp. 111196, Cambridge, Mass.: Blackwells.
An earlier version, issued as Working Paper 9412, Center for Studies in Demography and Ecology, University of Washington (1994) is available as a technical report from the Department of Statistics, University of Washington.
summary.bic.glm
,
print.bic.glm
,
plot.bic.glm
## Not run:
### logistic regression
library("MASS")
data(birthwt)
y< birthwt$lo
x< data.frame(birthwt[,1])
x$race< as.factor(x$race)
x$ht< (x$ht>=1)+0
x< x[,9]
x$smoke < as.factor(x$smoke)
x$ptl< as.factor(x$ptl)
x$ht < as.factor(x$ht)
x$ui < as.factor(x$ui)
glm.out.FT < bic.glm(x, y, strict = FALSE, OR = 20,
glm.family="binomial", factor.type=TRUE)
summary(glm.out.FT)
imageplot.bma(glm.out.FT)
glm.out.FF < bic.glm(x, y, strict = FALSE, OR = 20,
glm.family="binomial", factor.type=FALSE)
summary(glm.out.FF)
imageplot.bma(glm.out.FF)
glm.out.TT < bic.glm(x, y, strict = TRUE, OR = 20,
glm.family="binomial", factor.type=TRUE)
summary(glm.out.TT)
imageplot.bma(glm.out.TT)
glm.out.TF < bic.glm(x, y, strict = TRUE, OR = 20,
glm.family="binomial", factor.type=FALSE)
summary(glm.out.TF)
imageplot.bma(glm.out.TF)
## End(Not run)
## Not run:
### Gamma family
library(survival)
data(veteran)
surv.t< veteran$time
x< veteran[,c(3,4)]
x$celltype< factor(as.character(x$celltype))
sel< veteran$status == 0
x< x[!sel,]
surv.t< surv.t[!sel]
glm.out.va < bic.glm(x, y=surv.t, glm.family=Gamma(link="inverse"),
factor.type=FALSE)
summary(glm.out.va)
imageplot.bma(glm.out.va)
plot(glm.out.va)
## End(Not run)
### Poisson family
### Yates (teeth) data.
x< rbind(
c(0, 0, 0),
c(0, 1, 0),
c(1, 0, 0),
c(1, 1, 1))
y<c(4, 16, 1, 21)
n<c(1,1,1,1)
models< rbind(
c(1, 1, 0),
c(1, 1, 1))
glm.out.yates < bic.glm( x, y, n, glm.family = poisson(),
factor.type=FALSE)
summary(glm.out.yates)
## Not run:
### Gaussian
library(MASS)
data(UScrime)
f < formula(log(y) ~ log(M)+So+log(Ed)+log(Po1)+log(Po2)+log(LF)+
log(M.F)+ log(Pop)+log(NW)+log(U1)+log(U2)+
log(GDP)+log(Ineq)+log(Prob)+log(Time))
glm.out.crime < bic.glm(f, data = UScrime, glm.family = gaussian())
summary(glm.out.crime)
# note the problems with the estimation of the posterior standard
# deviation (compare with bicreg example)
## End(Not run)