EBglmnet-package {EBglmnet} | R Documentation |
Empirical Bayesian Lasso (EBlasso) and Elastic Net (EBEN) Methods for Generalized Linear Models
Description
Fast Empirical Bayesian Lasso (EBlasso) and Elastic Net (EBEN) are generalized linear regression methods for variable selections and effect estimations.
Similar as lasso
and elastic net
implemented in the package glmnet, EBglmnet features
the capabilities of handling p>>n
data, where p
is the number of variables and n
is
the number of samples in the regression model, and inferring a sparse solution such that irrelevant variables
will have exactly zero value on their regression coefficients. Additionally, there are several unique features in EBglmnet:
1) Both EBlasso
and EBEN
can select more than n
nonzero effects.
2) EBglmnet also performs hypothesis testing for the significance of nonzero estimates.
There are three sets of hierarchical prior distributions implemented in EBglmnet:
1) EBlasso-NE is a two-level prior with (normal + exponential) distributions for the regression coefficients.
2) EBlasso-NEG is a three-level hierarchical prior with (normal + exponential + gamma) distributions.
3) EBEN implements a normal and generalized gamma hierarchical prior.
While those sets of priors are all "peak zero and flat tails", EBlasso-NE
assigns more probability mass to the tails, resulting in more nonzero estimates having large p
-values. In contrast, EBlasso-NEG
has a third level constraint on the lasso
prior, which results in higher probability mass around zero, thus more sparse results in the final outcome. Meanwhile, EBEN
encourages a grouping effect such that highly correlated variables can be selected as a group.
Similar as the relationship between elastic net
and lasso
, there are two parameters (\alpha, \lambda)
required for EBEN
, and it is reduced to EBlasso-NE
when parameter \alpha = 1
. We recommend using EBlasso-NEG when there are a large number of candidate effects, using EBlasso-NE when effect sizes are relatively small, and using EBEN when groups of highly correlated variables such as co-regulated gene expressions are of interest.
Two models are available for both methods: linear regression model and logistic regression model. Other features in this package includes:
* 1 * epistasis (two-way interactions) can be included for all models/priors;
* 2 * model implemented with memory efficient C
code;
* 3 * LAPACK/BLAS are used for most linear algebra computations.
Several simulation and real data analysis in the reference papers demonstrated that EBglmnet enjoys better performance than lasso
and elastic net
methods in terms of power of detection,
false discover rate, as well as encouraging grouping effect when applicable.
Key Algorithms are described in the following paper:
1. EBlasso-NEG: (Cai X., Huang A., and Xu S., 2011), (Huang A., Xu S., and Cai X., 2013)
2. EBlasso-NE: (Huang A., Xu S., and Cai X., 2013)
3. group EBlasso: (Huang A., Martin E., et al. 2014)
4. EBEN: (Huang A., Xu S., and Cai X., 2015)
5. Whole-genome QTL mapping: (Huang A., Xu S., and Cai X., 2014)
EBglmnet version after V5 will not support the following. For those functionalities, please refer to the 'cran' package 'EBEN'. - Two way interaction (epistasis) will not be supported; - Group EBlasso will not be supported.
Details
Package: | EBglmnet |
Type: | Package |
Version: | 6.0 |
Date: | 2016-01-15 |
License: | gpl |
Author(s)
Anhui Huang, Dianting Liu
Maintainer: Anhui Huang <anhuihuang@gmail.com>
References
Huang, A., Xu, S., and Cai, X. (2015). Empirical Bayesian elastic net for multiple quantitative trait locus mapping. Heredity 114(1): 107-115.
Huang, A., E. Martin, et al. (2014). "Detecting genetic interactions in pathway-based genome-wide association studies." Genet Epidemiol 38(4): 300-309.
Huang, A., S. Xu, et al. (2014). "Whole-genome quantitative trait locus mapping reveals major role of epistasis on yield of rice." PLoS ONE 9(1): e87330.
Huang, A. (2014). "Sparse model learning for inferring genotype and phenotype associations." Ph.D Dissertation. University of Miami(1186).
Huang A, Xu S, Cai X. (2013). Empirical Bayesian LASSO-logistic regression for multiple binary trait locus mapping. BMC genetics 14(1):5.
Cai, X., Huang, A., and Xu, S. (2011). Fast empirical Bayesian LASSO for multiple quantitative trait locus mapping. BMC Bioinformatics 12, 211.