logreg_bvs {BVSNLP}R Documentation

Non-parallel version of Bayesian variable selector for logistic regression data using nonlocal priors

Description

This function performs Bayesian variable selection for logistic regression data in a non-parallel fashion. It does not contain any pre-processing step or variable initialization. Moreover it does not have the feature to be run in parallel for performing the coupling algorithm. Therefore in general, it is NOT recommended to be used unless the user knows how to initialize all the parameters. However, this function is called by bvs function, the recommended way to run Bayesian variable selection for such datasets.

Usage

logreg_bvs(
  exmat,
  chain1,
  nf,
  tau,
  r,
  nlptype,
  a,
  b,
  in_cons,
  loopcnt,
  cplng,
  chain2
)

Arguments

exmat

An extended matrix where the first column is binary resonse vector and the rest is the design matrix which has its first column all 1 to account for the intercept in the model and is the output of PreProcess code where the fixed columns are moved to the beginning.

chain1

The first chain or initial model where the MCMC algorithm starts from. Note that the first nf+1 elements are 1 where nf is the number of fixed covariates that do not enter the selection procedure and 1 is for the intercept.

nf

The number of fixed covariates that do not enter the selection procedure.

tau

The paramter tau of the iMOM prior.

r

The paramter r of the iMOM prior.

nlptype

Determines the type of nonlocal prior that is used in the analyses. 0 is for piMOM and 1 is for pMOM.

a

The first parameter in beta distribution used as prior on model size. This parameter is equal to 1 when uinform-binomial prior is used.

b

The second paramter in beta distribution used as prior on model size. This parameter is equal to 1 when uinform-binomial prior is used.

in_cons

The average model size. This value under certain conditions and when p is large, is equal to parameter a of the beta-binomial prior.

loopcnt

Number of iterations for MCMC procedure.

cplng

A boolean variable indicating the coupling algorithm to be performed or not.

chain2

Second chain or model for starting the MCMC procedure. This parameter is only used when cplng=TRUE. Thus, it could be simply set to chain1 when it is not used.

Value

It returns a list containing following objects:

max_chain

A 1 by p+1 binary vector showing the selected model with maximum probability. 1 means a specific variable is selected. The first variable is always the intercept.

beta_hat

The coefficient vector for the selected model. The first one is always for the intercept.

max_prop

The unnormalized probability of the model with highest posterior probability.

num_iterations

The number of MCMC iterations that are executed. This is used when cplng=TRUE to check whether the total designated MCMC iterations were used or two chains are coupled sooner than that.

cplng_flag

This is used when cplng=TRUE and indicates whether two chains are coupled or not.

num_vis_models

Number of visited models in search for the highest probability model. This contains redundant models too and is not the number of unique models.

hash_key

This is only used when cplng = FALSE. This is a vector containing real numbers uniquely assigned to each model for distinguishing them.

hash_prob

This is only used when cplng = FALSE. This is a vector of probabilities for each visited model.

vis_covs

This is only used when cplng = FALSE. This is a list where each element contains indices of covariates for each visited model.

Author(s)

Amir Nikooienejad

References

Nikooienejad, A., Wang, W., and Johnson, V. E. (2016). Bayesian variable selection for binary outcomes in high dimensional genomic studies using nonlocal priors. Bioinformatics, 32(9), 1338-1345.

Nikooienejad, A., Wang, W., and Johnson, V. E. (2017). Bayesian Variable Selection in High Dimensional Survival Time Cancer Genomic Datasets using Nonlocal Priors. arXiv preprint, arXiv:1712.02964.

Johnson, V. E., and Rossell, D. (2010). On the use of non-local prior densities in Bayesian hypothesis tests. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(2), 143-170.

Johnson, V. E. (1998). A coupling-regeneration scheme for diagnosing convergence in Markov chain Monte Carlo algorithms. Journal of the American Statistical Association, 93(441), 238-248.

See Also

bvs

Examples

### Initializing parameters
n <- 200
p <- 40
set.seed(123)
Sigma <- diag(p)
full <- matrix(c(rep(0.5, p*p)), ncol=p)
Sigma <- full + 0.5*Sigma
cholS <- chol(Sigma)
Beta <- c(-1.7,1.8,2.5)
X <- matrix(rnorm(n*p), ncol=p)
X <- X%*%cholS
colnames(X) <- paste("gene_",c(1:p),sep="")
beta <- numeric(p)
beta[c(1:length(Beta))] <- Beta
XB <- X%*%beta
probs <- as.vector(exp(XB)/(1+exp(XB)))
y <- rbinom(n,1,probs)
exmat <- cbind(y,X)
tau <- 0.5; r <- 1; a <- 3; b <- p-a; in_cons <- a;
loopcnt <- 100; cplng <- FALSE;
initProb <- in_cons/p

### Initializing Chains
schain <- p
while (schain > in_cons || schain == 0) {
chain1 <- rbinom(p, 1, initProb)
 schain <- sum(chain1)
}
chain1 <- as.numeric(c(1, chain1))
chain2 <- chain1
nlptype <- 0 ## PiMOM nonlocal prior
nf <- 0 ### No fixed columns

### Running the function
bvsout <- logreg_bvs(exmat,chain1,nf,tau,r,nlptype,a,b,in_cons,loopcnt,cplng,chain2)

### Number of visited models for this specific run:
bvsout$num_vis_models

### The selected model:
which(bvsout$max_chain > 0)

### Estimated coefficients:
bvsout$beta_hat

### The unnormalized probability of the selected model:
bvsout$max_prob

[Package BVSNLP version 1.1.9 Index]