R: Bayesian linear regression models

gbayes {qgg}

R Documentation

Bayesian linear regression models

Description

Bayesian linear regression (BLR) models:

- unified mapping of genetic variants, estimation of genetic parameters (e.g. heritability) and prediction of disease risk)

- handles different genetic architectures (few large, many small effects)

- scale to large data (e.g. sparse LD)

In the Bayesian multiple regression model the posterior density of the model parameters depend on the likelihood of the data given the parameters and a prior probability for the model parameters

The prior density of marker effects defines whether the model will induce variable selection and shrinkage or shrinkage only. Also, the choice of prior will define the extent and type of shrinkage induced. Ideally the choice of prior for the marker effect should reflect the genetic architecture of the trait, and will vary (perhaps a lot) across traits.

The following prior distributions are provided:

Bayes N: Assigning a Gaussian prior to marker effects implies that the posterior means are the BLUP estimates (same as Ridge Regression).

Bayes L: Assigning a double-exponential or Laplace prior is the density used in the Bayesian LASSO

Bayes A: similar to ridge regression but t-distribution prior (rather than Gaussian) for the marker effects ; variance comes from an inverse-chi-square distribution instead of being fixed. Estimation via Gibbs sampling.

Bayes C: uses a “rounded spike” (low-variance Gaussian) at origin many small effects can contribute to polygenic component, reduces the dimensionality of the model (makes Gibbs sampling feasible).

Bayes R: Hierarchical Bayesian mixture model with 4 Gaussian components, with variances scaled by 0, 0.0001 , 0.001 , and 0.01 .

Usage

gbayes(
  y = NULL,
  X = NULL,
  W = NULL,
  stat = NULL,
  covs = NULL,
  trait = NULL,
  fit = NULL,
  Glist = NULL,
  chr = NULL,
  rsids = NULL,
  b = NULL,
  bm = NULL,
  seb = NULL,
  LD = NULL,
  n = NULL,
  formatLD = "dense",
  vg = NULL,
  vb = NULL,
  ve = NULL,
  ssg_prior = NULL,
  ssb_prior = NULL,
  sse_prior = NULL,
  lambda = NULL,
  scaleY = TRUE,
  h2 = NULL,
  pi = 0.001,
  updateB = TRUE,
  updateG = TRUE,
  updateE = TRUE,
  updatePi = TRUE,
  adjustE = TRUE,
  models = NULL,
  nug = 4,
  nub = 4,
  nue = 4,
  verbose = FALSE,
  msize = 100,
  mask = NULL,
  GRMlist = NULL,
  ve_prior = NULL,
  vg_prior = NULL,
  tol = 0.001,
  nit = 100,
  nburn = 0,
  nit_local = NULL,
  nit_global = NULL,
  method = "mixed",
  algorithm = "mcmc"
)

Arguments

`y`	is a vector or matrix of phenotypes
`X`	is a matrix of covariates
`W`	is a matrix of centered and scaled genotypes
`stat`	dataframe with marker summary statistics
`covs`	is a list of summary statistics (output from internal cvs function)
`trait`	is an integer used for selection traits in covs object
`fit`	is a list of results from gbayes
`Glist`	list of information about genotype matrix stored on disk
`chr`	is the chromosome for which to fit BLR models
`rsids`	is a character vector of rsids
`b`	is a vector or matrix of marginal marker effects
`bm`	is a vector or matrix of adjusted marker effects for the BLR model
`seb`	is a vector or matrix of standard error of marginal effects
`LD`	is a list with sparse LD matrices
`n`	is a scalar or vector of number of observations for each trait
`formatLD`	is a character specifying LD format (formatLD="dense" is default)
`vg`	is a scalar or matrix of genetic (co)variances
`vb`	is a scalar or matrix of marker (co)variances
`ve`	is a scalar or matrix of residual (co)variances
`ssg_prior`	is a scalar or matrix of prior genetic (co)variances
`ssb_prior`	is a scalar or matrix of prior marker (co)variances
`sse_prior`	is a scalar or matrix of prior residual (co)variances
`lambda`	is a vector or matrix of lambda values
`scaleY`	is a logical; if TRUE y is centered and scaled
`h2`	is the trait heritability
`pi`	is the proportion of markers in each marker variance class (e.g. pi=c(0.999,0.001),used if method="ssvs")
`updateB`	is a logical for updating marker (co)variances
`updateG`	is a logical for updating genetic (co)variances
`updateE`	is a logical for updating residual (co)variances
`updatePi`	is a logical for updating pi
`adjustE`	is a logical for adjusting residual variance
`models`	is a list structure with models evaluated in bayesC
`nug`	is a scalar or vector of prior degrees of freedom for prior genetic (co)variances
`nub`	is a scalar or vector of prior degrees of freedom for marker (co)variances
`nue`	is a scalar or vector of prior degrees of freedom for prior residual (co)variances
`verbose`	is a logical; if TRUE it prints more details during iteration
`msize`	number of markers used in compuation of sparseld
`mask`	is a vector or matrix of TRUE/FALSE specifying if marker should be ignored
`GRMlist`	is a list providing information about GRM matrix stored in binary files on disk
`ve_prior`	is a scalar or matrix of prior residual (co)variances
`vg_prior`	is a scalar or matrix of prior genetic (co)variances
`tol`	is tolerance, i.e. convergence criteria used in gbayes
`nit`	is the number of iterations
`nburn`	is the number of burnin iterations
`nit_local`	is the number of local iterations
`nit_global`	is the number of global iterations
`method`	specifies the methods used (method="bayesN","bayesA","bayesL","bayesC","bayesR")
`algorithm`	specifies the algorithm

Value

Returns a list structure including

`b`	vector or matrix (mxt) of posterior means for marker effects
`d`	vector or matrix (mxt) of posterior means for marker inclusion probabilities
`vb`	scalar or vector (t) of posterior means for marker variances
`vg`	scalar or vector (t) of posterior means for genomic variances
`ve`	scalar or vector (t) of posterior means for residual variances
`rb`	matrix (txt) of posterior means for marker correlations
`rg`	matrix (txt) of posterior means for genomic correlations
`re`	matrix (txt) of posterior means for residual correlations
`pi`	vector (1xnmodels) of posterior probabilities for models
`h2`	vector (1xt) of posterior means for model probability
`param`	a list current parameters (same information as item listed above) used for restart of the analysis
`stat`	matrix (mxt) of marker information and effects used for genomic risk scoring

Author(s)

Peter Sørensen

Examples



# Simulate data and test functions

W <- matrix(rnorm(100000),nrow=1000)
set1 <- sample(1:ncol(W),5)
set2 <- sample(1:ncol(W),5)
sets <- list(set1,set2)
g <- rowSums(W[,c(set1,set2)])
e <- rnorm(nrow(W),mean=0,sd=1)
y <- g + e


fitM <- gbayes(y=y, W=W, method="bayesN")
fitA <- gbayes(y=y, W=W, method="bayesA")
fitL <- gbayes(y=y, W=W, method="bayesL")
fitC <- gbayes(y=y, W=W, method="bayesC")

[Package qgg version 1.1.2 Index]