MC3.REG {BMA}R Documentation

Bayesian simultaneous variable selection and outlier identification

Description

Performs Bayesian simultaneous variable selection and outlier identification (SVO) via Markov chain Monte Carlo model composition (MC3).

Usage

MC3.REG(all.y, all.x, num.its, M0.var= , M0.out= , outs.list= , 
        outliers = TRUE, PI=.1*(length(all.y) <50) + 
        .02*(length(all.y) >= 50),  K=7, nu= , lambda= , phi= )

Arguments

all.y

a vector of responses

all.x

a matrix of covariates

num.its

the number of iterations of the Markov chain sampler

M0.var

a logical vector specifying the starting model. For example, if you have 3 predictors and the starting model is X1 and X3, then M0.var would be c(TRUE,FALSE,TRUE). The default is a logical vector of TRUEs. NOTE: the starting predictor model cannot be the null model.

M0.out

a logical vector specifying the starting model outlier set. The default value is a logical vector of TRUE's the same length as outs.list. This can be NULL only if outs.list is NULL, otherwise it must be the same length as outs.list (but can be a vector of all FALSE)

outs.list

a vector of all potential outlier locations (e.g. c(10,12) means the 10th and 12th points are potential outliers). If NULL and if outliers is TRUE, then potential outliers are estimated using the out.ltsreg function.

outliers

a logical parameter indicating whether outliers are to be included. If outs.list is non null then this outliers is ignored. If outs.list is NULL and outliers is TRUE, potential outliers are estimated as described above.

PI

a hyperparameter indicating the prior probability of an outlier. The default values are 0.1 if the data set has less than 50 observations, 0.02 otherwise.

K

a hyperparameter indicating the outlier inflation factor

nu

regression hyperparameter. Default value is 2.58 if r2 for the full model is less than 0.9 or 0.2 if r2 for the full model is greater than 0.9.

lambda

regression hyperparameter. Default value is 0.28 if r2 for the full model is less than 0.9 or 0.1684 if r2 for the full model is greater than 0.9.

phi

regression hyperparameter. Default value is 2.85 if r2 for the full model is less than 0.9 or 9.2 if r2 for the full model is greater than 0.9.

Details

Performs Bayesian simultaneous variable and outlier selection using Monte Carlo Markov Chain Model Choice (MC3). Potential models are visited using a Metropolis-Hastings algorithm on the integrated likelihood. At the end of the chain exact posterior probabilities are calculated for each model visited.

Value

An object of class mc3. Print and summary methods exist for this class. Objects of class mc3 are a list consisting of at least

post.prob

The posterior probabilities of each model visited.

variables

An indicator matrix of the variables in each model.

outliers

An indicator matrix of the outliers in each model, if outliers were selected.

visit.count

The number of times each model was visited.

outlier.numbers

An index showing which outliers were eligable for selection.

var.names

The names of the variables.

n.models

The number of models visited.

PI

The value of PI used.

K

The value of K used.

nu

The value of nu used.

lambda

The value of lambda used.

phi

The value of phi used.

call

The function call.

Note

The default values for nu, lambda and phi are recommended when the R2 value for the full model with all outliers is less than 0.9.

If PI is set too high it is possible to generate sub models which are singular, at which point the function will crash.

The implementation of this function is different from that used in the Splus function. In particular, variables which were global are now passed between functions.

Author(s)

Jennifer Hoeting jennifer.hoeting@gmail.com with the assistance of Gary Gadbury. Translation from Splus to R by Ian S. Painter.

References

Bayesian Model Averaging for Linear Regression Models Adrian E. Raftery, David Madigan, and Jennifer A. Hoeting (1997). Journal of the American Statistical Association, 92, 179-191.

A Method for Simultaneous Variable and Transformation Selection in Linear Regression Jennifer Hoeting, Adrian E. Raftery and David Madigan (2002). Journal of Computational and Graphical Statistics 11 (485-507)

A Method for Simultaneous Variable Selection and Outlier Identification in Linear Regression Jennifer Hoeting, Adrian E. Raftery and David Madigan (1996). Computational Statistics and Data Analysis, 22, 251-270

Earlier versions of these papers are available via the World Wide Web using the url: https://www.stat.colostate.edu/~jah/papers/

See Also

out.ltsreg as.data.frame.mc3

Examples


## Not run: 
# Example 1:   Scottish hill racing data.

data(race)
b<- out.ltsreg(race[,-1], race[,1], 2)
races.run1<-MC3.REG(race[,1], race[,-1], num.its=20000, c(FALSE,TRUE), 
                    rep(TRUE,length(b)), b, PI = .1, K = 7, nu = .2, 
                    lambda = .1684, phi = 9.2)
races.run1
summary(races.run1)

## End(Not run)

# Example 2: Crime data
library(MASS)
data(UScrime)

y.crime.log<- log(UScrime$y)
x.crime.log<- UScrime[,-ncol(UScrime)]
x.crime.log[,-2]<- log(x.crime.log[,-2])
crime.run1<-MC3.REG(y.crime.log, x.crime.log, num.its=2000, 
                     rep(TRUE,15), outliers = FALSE)
crime.run1[1:25,]
summary(crime.run1)





[Package BMA version 3.18.17 Index]