R: Non-parallel version of Bayesian variable selector for...

logreg_bvs {BVSNLP}

R Documentation

Non-parallel version of Bayesian variable selector for logistic regression data using nonlocal priors

Description

This function performs Bayesian variable selection for logistic regression data in a non-parallel fashion. It does not contain any pre-processing step or variable initialization. Moreover it does not have the feature to be run in parallel for performing the coupling algorithm. Therefore in general, it is NOT recommended to be used unless the user knows how to initialize all the parameters. However, this function is called by bvs function, the recommended way to run Bayesian variable selection for such datasets.

Usage

logreg_bvs(
  exmat,
  chain1,
  nf,
  tau,
  r,
  nlptype,
  a,
  b,
  in_cons,
  loopcnt,
  cplng,
  chain2
)

Arguments

`exmat`	An extended matrix where the first column is binary resonse vector and the rest is the design matrix which has its first column all 1 to account for the intercept in the model and is the output of `PreProcess` code where the fixed columns are moved to the beginning.
`chain1`	The first chain or initial model where the MCMC algorithm starts from. Note that the first `nf+1` elements are `1` where `nf` is the number of fixed covariates that do not enter the selection procedure and `1` is for the intercept.
`nf`	The number of fixed covariates that do not enter the selection procedure.
`tau`	The paramter `tau` of the iMOM prior.
`r`	The paramter `r` of the iMOM prior.
`nlptype`	Determines the type of nonlocal prior that is used in the analyses. `0` is for piMOM and `1` is for pMOM.
`a`	The first parameter in beta distribution used as prior on model size. This parameter is equal to 1 when uinform-binomial prior is used.
`b`	The second paramter in beta distribution used as prior on model size. This parameter is equal to 1 when uinform-binomial prior is used.
`in_cons`	The average model size. This value under certain conditions and when `p` is large, is equal to parameter `a` of the beta-binomial prior.
`loopcnt`	Number of iterations for MCMC procedure.
`cplng`	A boolean variable indicating the coupling algorithm to be performed or not.
`chain2`	Second chain or model for starting the MCMC procedure. This parameter is only used when `cplng=TRUE`. Thus, it could be simply set to `chain1` when it is not used.

Value

It returns a list containing following objects:

`max_chain`	A `1` by `p+1` binary vector showing the selected model with maximum probability. `1` means a specific variable is selected. The first variable is always the intercept.
`beta_hat`	The coefficient vector for the selected model. The first one is always for the intercept.
`max_prop`	The unnormalized probability of the model with highest posterior probability.
`num_iterations`	The number of MCMC iterations that are executed. This is used when `cplng=TRUE` to check whether the total designated MCMC iterations were used or two chains are coupled sooner than that.
`cplng_flag`	This is used when `cplng=TRUE` and indicates whether two chains are coupled or not.
`num_vis_models`	Number of visited models in search for the highest probability model. This contains redundant models too and is not the number of unique models.
`hash_key`	This is only used when `cplng = FALSE`. This is a vector containing real numbers uniquely assigned to each model for distinguishing them.
`hash_prob`	This is only used when `cplng = FALSE`. This is a vector of probabilities for each visited model.
`vis_covs`	This is only used when `cplng = FALSE`. This is a list where each element contains indices of covariates for each visited model.

Author(s)

Amir Nikooienejad

References

Nikooienejad, A., Wang, W., and Johnson, V. E. (2016). Bayesian variable selection for binary outcomes in high dimensional genomic studies using nonlocal priors. Bioinformatics, 32(9), 1338-1345.

Nikooienejad, A., Wang, W., and Johnson, V. E. (2017). Bayesian Variable Selection in High Dimensional Survival Time Cancer Genomic Datasets using Nonlocal Priors. arXiv preprint, arXiv:1712.02964.

Johnson, V. E., and Rossell, D. (2010). On the use of non-local prior densities in Bayesian hypothesis tests. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(2), 143-170.

Johnson, V. E. (1998). A coupling-regeneration scheme for diagnosing convergence in Markov chain Monte Carlo algorithms. Journal of the American Statistical Association, 93(441), 238-248.

Examples

### Initializing parameters
n <- 200
p <- 40
set.seed(123)
Sigma <- diag(p)
full <- matrix(c(rep(0.5, p*p)), ncol=p)
Sigma <- full + 0.5*Sigma
cholS <- chol(Sigma)
Beta <- c(-1.7,1.8,2.5)
X <- matrix(rnorm(n*p), ncol=p)
X <- X%*%cholS
colnames(X) <- paste("gene_",c(1:p),sep="")
beta <- numeric(p)
beta[c(1:length(Beta))] <- Beta
XB <- X%*%beta
probs <- as.vector(exp(XB)/(1+exp(XB)))
y <- rbinom(n,1,probs)
exmat <- cbind(y,X)
tau <- 0.5; r <- 1; a <- 3; b <- p-a; in_cons <- a;
loopcnt <- 100; cplng <- FALSE;
initProb <- in_cons/p

### Initializing Chains
schain <- p
while (schain > in_cons || schain == 0) {
chain1 <- rbinom(p, 1, initProb)
 schain <- sum(chain1)
}
chain1 <- as.numeric(c(1, chain1))
chain2 <- chain1
nlptype <- 0 ## PiMOM nonlocal prior
nf <- 0 ### No fixed columns

### Running the function
bvsout <- logreg_bvs(exmat,chain1,nf,tau,r,nlptype,a,b,in_cons,loopcnt,cplng,chain2)

### Number of visited models for this specific run:
bvsout$num_vis_models

### The selected model:
which(bvsout$max_chain > 0)

### Estimated coefficients:
bvsout$beta_hat

### The unnormalized probability of the selected model:
bvsout$max_prob

[Package BVSNLP version 1.1.9 Index]