R: fit a robust Bayesian variable selection model for G×E...

marble {marble}

R Documentation

fit a robust Bayesian variable selection model for G×E interactions.

Description

fit a robust Bayesian variable selection model for G×E interactions.

Usage

marble(
  X,
  Y,
  E,
  clin,
  max.steps = 10000,
  robust = TRUE,
  sparse = TRUE,
  debugging = FALSE
)

Arguments

`X`	the matrix of predictors (genetic factors). Each row should be an observation vector.
`Y`	the continuous response variable.
`E`	a matrix of environmental factors. E will be centered. The interaction terms between X (genetic factors) and E will be automatically created and included in the model.
`clin`	a matrix of clinical variables. Clinical variables are not subject to penalty. Clinical variables will be centered and a column of 1 will be added to the Clinical matrix as the intercept.
`max.steps`	the number of MCMC iterations.
`robust`	logical flag. If TRUE, robust methods will be used.
`sparse`	logical flag. If TRUE, spike-and-slab priors will be used to shrink coefficients of irrelevant covariates to zero exactly.
`debugging`	logical flag. If TRUE, progress will be output to the console and extra information will be returned.

Details

Consider the data model described in "dat":

Y_{i} = \alpha_{0} + \sum_{k=1}^{q}\alpha_{k}E_{ik}+\sum_{t=1}^{m}\gamma_{t}clin_{it}+\beta_{j}X_{ij}+\sum_{k=1}^{q}\eta_{jk}X_{ij}E_{ik}+\epsilon_{i},

Where \alpha_{0} is the intercept, \alpha_{k}'s and \gamma_{t}'s are the regression coefficients corresponding to effects of environmental and clinical factors. And \beta_{j}'s and \eta_{jk}'s are the regression coefficients of the genetic variants and G\timesE interactions effects, correspondingly.

When sparse=TRUE (default), spike–and–slab priors are imposed to identify important main and interaction effects. If sparse=FALSE, Laplacian shrinkage will be used.

When robust=TRUE (default), the distribution of \epsilon_{i} is defined as a Laplace distribution with density f(\epsilon_{i}|\nu) = \frac{\nu}{2}\exp\left\{-\nu |\epsilon_{i}|\right\}, (i=1,\dots,n), which leads to a Bayesian formulation of LAD regression. If robust=FALSE, \epsilon_{i} follows a normal distribution.

Here, a rank list of the main and interaction effects is provided. For method incorporating spike-and-slab priors, the inclusion probability is used to indicate the importance of predictors. We use a binary indicator \phi to denote that the membership of the non-spike distribution. Take the main effect of the jth genetic factor, X_{j}, as an example. Suppose we have collected H posterior samples from MCMC after burn-ins. The jth G factor is included in the marginal G\timesE model at the jth MCMC iteration if the corresponding indicator is 1, i.e., \phi_j^{(h)} = 1. Subsequently, the posterior probability of retaining the jth genetic main effect in the final marginal model is defined as the average of all the indicators for the jth G factor among the H posterior samples. That is, p_j = \hat{\pi} (\phi_j = 1|y) = \frac{1}{H} \sum_{h=1}^{H} \phi_j^{(h)}, \; j = 1, \dots,p. A larger posterior inclusion probability jth indicates a stronger empirical evidence that the jth genetic main effect has a non-zero coefficient, i.e., a stronger association with the phenotypic trait. For method without spike-and-slab priors, variable selection is based on different level of credible intervals.

Both X, clin and E will be standardized before the generation of interaction terms to avoid the multicollinearity between main effects and interaction terms.

Please check the references for more details about the prior distributions.

Value

an object of class ‘marble’ is returned, which is a list with component:

`posterior`	the posterior samples of coefficients from the MCMC.
`coefficient`	the estimated value of coefficients.
`ranklist`	the rank list of main and interaction effects.
`burn.in`	the total number of burn-ins.
`iterations`	the total number of iterations.
`design`	the design matrix of all effects.

References

Lu, X., Fan, K., Ren, J., and Wu, C. (2021). Identifying Gene–Environment Interactions With Robust Marginal Bayesian Variable Selection. Frontiers in Genetics, 12:667074 doi:10.3389/fgene.2021.667074

Examples

data(dat)

## default method
max.steps=5000
fit=marble(X, Y, E, clin, max.steps=max.steps)

## coefficients of parameters
fit$coefficient

## Estimated values of main G effects 
fit$coefficient$G

## Estimated values of interactions effects 
fit$coefficient$GE

## Rank list of main G effects and interactions 
fit$ranklist


## alternative: robust selection
fit=marble(X, Y, E, clin, max.steps=max.steps, robust=TRUE, sparse=FALSE)
fit$coefficient
fit$ranklist

## alternative: non-robust sparse selection
fit=marble(X, Y, E, clin, max.steps=max.steps, robust=FALSE, sparse=FALSE)
fit$coefficient
fit$ranklist

[Package marble version 0.0.3 Index]

fit a robust Bayesian variable selection model for G×E interactions.

Description

Usage

Arguments

Details

Value

References

See Also

Examples