BayesSUR {BayesSUR} | R Documentation |
Fitting BayesSUR models
Description
Main function of the package. Fits a range of models introduced in the
package vignette BayesSUR.pdf
. Returns an object of S3 class
BayesSUR
. There are three options for the prior on the residual
covariance matrix (i.e., independent inverse-Gamma, inverse-Wishart and
hyper-inverse Wishart) and three options for the prior on the latent
indicator variable (i.e., independent Bernoulli, hotspot and Markov random
field). So there are nine models in total. See details for their combinations.
Usage
BayesSUR(
data = NULL,
Y,
X,
X_0 = NULL,
covariancePrior = "HIW",
gammaPrior = "hotspot",
betaPrior = "independent",
nIter = 10000,
burnin = 5000,
nChains = 2,
outFilePath = "",
gammaSampler = "bandit",
gammaInit = "R",
mrfG = NULL,
standardize = TRUE,
standardize.response = TRUE,
maxThreads = 1,
tick = 1000,
output_gamma = TRUE,
output_beta = TRUE,
output_Gy = TRUE,
output_sigmaRho = TRUE,
output_pi = TRUE,
output_tail = TRUE,
output_model_size = TRUE,
output_model_visit = FALSE,
output_CPO = FALSE,
output_Y = TRUE,
output_X = TRUE,
hyperpar = list(),
tmpFolder = "tmp/"
)
Arguments
data |
a numeric matrix with variables on the columns and observations
on the rows, if arguments |
Y , X |
vectors of indices (with respect to the data matrix) for the
outcomes ( |
X_0 |
vectors of indices (with respect to the data matrix) for the fixed predictors that are not selected, i.e. always included in the model; if the data argument is not provided, this needs to be a numeric matrix containing the data instead, with variables on the columns and observations on the rows |
covariancePrior |
string indicating the prior for the covariance $C$;
it has to be either |
gammaPrior |
string indicating the gamma prior to use, either
|
betaPrior |
string indicating the prior for regression coefficients; it
has to be either |
nIter |
number of iterations for the MCMC procedure. Default 10000 |
burnin |
number of iterations to discard at the start of the chain. Default is 5000 |
nChains |
number of parallel tempered chains to run (default 2). The temperature is adapted during the burnin phase |
outFilePath |
path to where the output files are to be written |
gammaSampler |
string indicating the type of sampler for gamma, either
|
gammaInit |
gamma initialisation to either all-zeros ( |
mrfG |
either a matrix or a path to the file containing (the edge list of) the G matrix for the MRF prior on gamma (if necessary) |
standardize |
logical flag for X variable standardization. Default is
|
standardize.response |
logical flag for Y standardization. Default is
|
maxThreads |
maximum threads used for parallelization. Default is 1.
Reproducibility of results with |
tick |
an integer used for printing the iteration index and some updated parameters every tick-th iteration. Default is 1000 |
output_gamma |
allow ( |
output_beta |
allow ( |
output_Gy |
allow ( |
output_sigmaRho |
allow ( |
output_pi |
allow ( |
output_tail |
allow ( |
output_model_size |
allow ( |
output_model_visit |
allow ( |
output_CPO |
allow ( |
output_Y |
allow ( |
output_X |
allow ( |
hyperpar |
a list of named hypeparameters to use instead of the default values. Valid names are mrf_d, mrf_e, a_sigma, b_sigma, a_tau, b_tau, nu, a_eta, b_eta, a_o, b_o, a_pi, b_pi, a_w and b_w. Their default values are a_w=2, b_w=5, a_omega=2, b_omega=1, a_o=2, b_o=p-2, a_pi=2, b_pi=1, nu=s+2, a_tau=0.1, b_tau=10, a_eta=0.1, b_eta=1, a_sigma=1, b_sigma=1, mrf_d=-3 and mrf_e=0.03. See the vignette for more information |
tmpFolder |
the path to a temporary folder where intermediate data
files are stored (will be erased at the end of the chain). It is specified
relative to |
Details
The arguments covariancePrior
and gammaPrior
specify
the model HRR, dSUR or SSUR with different gamma prior. Let
\gamma_{jk}
be latent indicator variable of each coefficient and
C
be covariance matrix of response variables. The nine models
specified through the arguments covariancePrior
and
gammaPrior
are as follows.
\gamma_{jk} ~Bernoulli | \gamma_{jk} ~hotspot | \gamma ~MRF |
|
C ~indep | HRR-B | HRR-H | HRR-M |
C ~IW | dSUR-B | dSUR-H | dSUR-M |
C ~HIW | SSUR-B | SSUR-H | SSUR-M |
Value
An object of class BayesSUR
is saved as
obj_BayesSUR.RData
in the output file, including the following
components:
status - the running status
input - a list of all input parameters by the user
output - a list of the all output filenames:
"
*_logP_out.txt
" - contains each row for the1000t
-th iteration's log-likelihoods of parameters, i.e., Tau, Eta, JunctionTree, SigmaRho, O, Pi, Gamma, W, Beta and data conditional log-likelihood depending on the models."
*_gamma_out.txt
" - posterior mean of the latent indicator matrix."
*_pi_out.txt
" - posterior mean of the predictor effects (prospensity) by decomposing the probability of the latent indicator."
*_hotspot_tail_p_out.txt
" - posterior mean of the hotspot tail probability. Only available for the hotspot prior on the gamma."
*_beta_out.txt
" - posterior mean of the coefficients matrix."
*_Gy_out.txt
" - posterior mean of the response graph. Only available for the HIW prior on the covariance."
*_sigmaRho_out.txt
" - posterior mean of the transformed parameters. Not available for the IG prior on the covariance."
*_model_size_out.txt
" - contains each row for the1000t
-th iteration's model sizes of the multiple response variables."
*_model_visit_gy_out.txt
" - contains each row for the nonzero indices of the vectorized estimated graph matrix for each iteration."
*_model_visit_gamma_out.txt
" - contains each row for the nonzero indices of the vectorized estimated gamma matrix for each iteration."
*_CPO_out.txt
" - the (scaled) conditional predictive ordinates (CPO)."
*_CPOsumy_out.txt
" - the (scaled) conditional predictive ordinates (CPO) with joint posterior predictive of the response variables."
*_WAIC_out.txt
" - the widely applicable information criterion (WAIC)."
*_Y.txt
" - responses dataset."
*_X.txt
" - predictors dataset."
*_X0.txt
" - fixed predictors dataset.
call - the matched call.
References
Russo D, Van Roy B, Kazerouni A, Osband I, Wen Z (2018). A tutorial on Thompson sampling. Foundations and Trends in Machine Learning, 11: 1-96.
Madigan D, York J (1995). Bayesian graphical models for discrete data. International Statistical Review, 63: 215–232.
Bottolo L, Banterle M, Richardson S, Ala-Korpela M, Jarvelin MR, Lewin A (2020). A computationally efficient Bayesian seemingly unrelated regressions model for high-dimensional quantitative trait loci discovery. Journal of Royal Statistical Society: Series C, 70: 886-908.
Zhao Z, Banterle M, Bottolo L, Richardson S, Lewin A, Zucknick M (2021). BayesSUR: An R package for high-dimensional multivariate Bayesian variable and covariance selection in linear regression. Journal of Statistical Software, 100: 1–32.
Zhao Z, Banterle M, Lewin A, Zucknick M (2023). Multivariate Bayesian structured variable selection for pharmacogenomic studies. Journal of the Royal Statistical Society: Series C (Applied Statistics), qlad102.
Examples
data("exampleEQTL", package = "BayesSUR")
hyperpar <- list(a_w = 2, b_w = 5)
set.seed(9173)
fit <- BayesSUR(
Y = exampleEQTL[["blockList"]][[1]],
X = exampleEQTL[["blockList"]][[2]],
data = exampleEQTL[["data"]], outFilePath = tempdir(),
nIter = 5, burnin = 0, nChains = 1, gammaPrior = "hotspot",
hyperpar = hyperpar, tmpFolder = "tmp/", output_CPO = TRUE
)
## check output
# show the summary information
summary(fit)
# show the estimated beta, gamma and graph of responses Gy
plot(fit, estimator = c("beta", "gamma", "Gy"), type = "heatmap")
## Not run:
## Set up temporary work directory for saving a pdf figure
# td <- tempdir()
# oldwd <- getwd()
# setwd(td)
## Produce authentic math formulas in the graph
# plot(fit, estimator = c("beta", "gamma", "Gy"), type = "heatmap", fig.tex = TRUE)
# system(paste(getOption("pdfviewer"), "ParamEstimator.pdf"))
# setwd(oldwd)
## End(Not run)