BayesSUR {BayesSUR}R Documentation

Fitting BayesSUR models

Description

Main function of the package. Fits a range of models introduced in the package vignette BayesSUR.pdf. Returns an object of S3 class BayesSUR. There are three options for the prior on the residual covariance matrix (i.e., independent inverse-Gamma, inverse-Wishart and hyper-inverse Wishart) and three options for the prior on the latent indicator variable (i.e., independent Bernoulli, hotspot and Markov random field). So there are nine models in total. See details for their combinations.

Usage

BayesSUR(
  data = NULL,
  Y,
  X,
  X_0 = NULL,
  covariancePrior = "HIW",
  gammaPrior = "hotspot",
  betaPrior = "independent",
  nIter = 10000,
  burnin = 5000,
  nChains = 2,
  outFilePath = "",
  gammaSampler = "bandit",
  gammaInit = "R",
  mrfG = NULL,
  standardize = TRUE,
  standardize.response = TRUE,
  maxThreads = 1,
  tick = 1000,
  output_gamma = TRUE,
  output_beta = TRUE,
  output_Gy = TRUE,
  output_sigmaRho = TRUE,
  output_pi = TRUE,
  output_tail = TRUE,
  output_model_size = TRUE,
  output_model_visit = FALSE,
  output_CPO = FALSE,
  output_Y = TRUE,
  output_X = TRUE,
  hyperpar = list(),
  tmpFolder = "tmp/"
)

Arguments

data

a numeric matrix with variables on the columns and observations on the rows, if arguments Y and X (and possibly X_0) are vectors. Can be NULL if arguments Y and X (and possibly X_0) are numeric matrices

Y, X

vectors of indices (with respect to the data matrix) for the outcomes (Y) and the predictors to select (X) respectively; if the data argument is NULL, these needs to be numeric matrices containing the data instead, with variables on the columns and observations on the rows

X_0

vectors of indices (with respect to the data matrix) for the fixed predictors that are not selected, i.e. always included in the model; if the data argument is not provided, this needs to be a numeric matrix containing the data instead, with variables on the columns and observations on the rows

covariancePrior

string indicating the prior for the covariance $C$; it has to be either HIW for the hyper-inverse-Wishar (which will result in a sparse covariance matrix), IW for the inverse-Wishart prior (dense covariance) or IG for independent inverse-Gamma on all the diagonal elements and 0 otherwise. See the details for the model specification

gammaPrior

string indicating the gamma prior to use, either hotspot (default) for the Hotspot prior of Bottolo (2011), MRF for the Markov Random Field prior or hierarchical for a simpler hierarchical prior. See the details for the model specification

betaPrior

string indicating the prior for regression coefficients; it has to be either independent for independent spike-and-slab priors (only slab part for X_0 if specified), or reGroup for weakly normal priors for mandatory variables (random effects) and spike-and-slab priors for other variables of Zhao (2023)

nIter

number of iterations for the MCMC procedure. Default 10000

burnin

number of iterations to discard at the start of the chain. Default is 5000

nChains

number of parallel tempered chains to run (default 2). The temperature is adapted during the burnin phase

outFilePath

path to where the output files are to be written

gammaSampler

string indicating the type of sampler for gamma, either bandit for the Thompson sampling inspired samper or MC3 for the usual MC^3 sampler. See Russo et al.(2018) or Madigan and York (1995) for details

gammaInit

gamma initialisation to either all-zeros (0), all ones (1), MLE-informed (MLE) or (default) randomly (R)

mrfG

either a matrix or a path to the file containing (the edge list of) the G matrix for the MRF prior on gamma (if necessary)

standardize

logical flag for X variable standardization. Default is standardize=TRUE. Coefficients are returned on the standardized scale

standardize.response

logical flag for Y standardization. Default is standardize.response=TRUE

maxThreads

maximum threads used for parallelization. Default is 1. Reproducibility of results with set.seed() is only guaranteed if maxThreads=1

tick

an integer used for printing the iteration index and some updated parameters every tick-th iteration. Default is 1000

output_gamma

allow (TRUE) or suppress (FALSE) the output for gamma. See the return value below for more information

output_beta

allow (TRUE) or suppress (FALSE) the output for beta. See the return value below for more information

output_Gy

allow (TRUE) or suppress (FALSE) the output for Gy. See the return value below for more information

output_sigmaRho

allow (TRUE) or suppress (FALSE) the output for sigmaRho. See the return value below for more information

output_pi

allow (TRUE) or suppress (FALSE) the output for pi. See the return value below for more information

output_tail

allow (TRUE) or suppress (FALSE) the output for tail (hotspot tail probability). See the return value below for more information

output_model_size

allow (TRUE) or suppress (FALSE) the output for model_size. See the return value below for more information

output_model_visit

allow (TRUE) or suppress (FALSE) the output for all visited models over the MCMC iterations. Default is FALSE. See the return value below for more information

output_CPO

allow (TRUE) or suppress (FALSE) the output for (scaled) conditional predictive ordinates (*_CPO_out.txt), CPO with joint posterior predictive of the response variables (*_CPOsumy_out.txt) and widely applicable information criterion (*_WAIC_out.txt). See the return value below for more information

output_Y

allow (TRUE) or suppress (FALSE) the output for responses dataset Y

output_X

allow (TRUE) or suppress (FALSE) the output for predictors dataset X

hyperpar

a list of named hypeparameters to use instead of the default values. Valid names are mrf_d, mrf_e, a_sigma, b_sigma, a_tau, b_tau, nu, a_eta, b_eta, a_o, b_o, a_pi, b_pi, a_w and b_w. Their default values are a_w=2, b_w=5, a_omega=2, b_omega=1, a_o=2, b_o=p-2, a_pi=2, b_pi=1, nu=s+2, a_tau=0.1, b_tau=10, a_eta=0.1, b_eta=1, a_sigma=1, b_sigma=1, mrf_d=-3 and mrf_e=0.03. See the vignette for more information

tmpFolder

the path to a temporary folder where intermediate data files are stored (will be erased at the end of the chain). It is specified relative to outFilePath

Details

The arguments covariancePrior and gammaPrior specify the model HRR, dSUR or SSUR with different gamma prior. Let \gamma_{jk} be latent indicator variable of each coefficient and C be covariance matrix of response variables. The nine models specified through the arguments covariancePrior and gammaPrior are as follows.

\gamma_{jk}~Bernoulli \gamma_{jk}~hotspot \gamma~MRF
C~indep HRR-B HRR-H HRR-M
C~IW dSUR-B dSUR-H dSUR-M
C~HIW SSUR-B SSUR-H SSUR-M

Value

An object of class BayesSUR is saved as obj_BayesSUR.RData in the output file, including the following components:

References

Russo D, Van Roy B, Kazerouni A, Osband I, Wen Z (2018). A tutorial on Thompson sampling. Foundations and Trends in Machine Learning, 11: 1-96.

Madigan D, York J (1995). Bayesian graphical models for discrete data. International Statistical Review, 63: 215–232.

Bottolo L, Banterle M, Richardson S, Ala-Korpela M, Jarvelin MR, Lewin A (2020). A computationally efficient Bayesian seemingly unrelated regressions model for high-dimensional quantitative trait loci discovery. Journal of Royal Statistical Society: Series C, 70: 886-908.

Zhao Z, Banterle M, Bottolo L, Richardson S, Lewin A, Zucknick M (2021). BayesSUR: An R package for high-dimensional multivariate Bayesian variable and covariance selection in linear regression. Journal of Statistical Software, 100: 1–32.

Zhao Z, Banterle M, Lewin A, Zucknick M (2023). Multivariate Bayesian structured variable selection for pharmacogenomic studies. Journal of the Royal Statistical Society: Series C (Applied Statistics), qlad102.

Examples

data("exampleEQTL", package = "BayesSUR")
hyperpar <- list(a_w = 2, b_w = 5)
set.seed(9173)
fit <- BayesSUR(
  Y = exampleEQTL[["blockList"]][[1]],
  X = exampleEQTL[["blockList"]][[2]],
  data = exampleEQTL[["data"]], outFilePath = tempdir(),
  nIter = 5, burnin = 0, nChains = 1, gammaPrior = "hotspot",
  hyperpar = hyperpar, tmpFolder = "tmp/", output_CPO = TRUE
)

## check output
# show the summary information
summary(fit)

# show the estimated beta, gamma and graph of responses Gy
plot(fit, estimator = c("beta", "gamma", "Gy"), type = "heatmap")

## Not run: 
## Set up temporary work directory for saving a pdf figure
# td <- tempdir()
# oldwd <- getwd()
# setwd(td)

## Produce authentic math formulas in the graph
# plot(fit, estimator = c("beta", "gamma", "Gy"), type = "heatmap", fig.tex = TRUE)
# system(paste(getOption("pdfviewer"), "ParamEstimator.pdf"))
# setwd(oldwd)

## End(Not run)


[Package BayesSUR version 2.1-7 Index]