cv.EBglmnet {EBglmnet} | R Documentation |
Cross Validation (CV) Function to Determine Hyperparameters of the EBglmnet Algorithms
Description
The degree of shrinkage, or equivalently, the number of non-zero effects selected by EBglmnet are
controlled by the hyperparameters in the prior distribution, which can be obtained
via Cross Validation (CV). This function performs k-fold CV for hyperparameter selection, and
outputs the model fit results using the optimal parameters. Therefore, this function runs
EBglmnet
for (k x n_parameters + 1
) times. By default, EBlasso-NE tests 20
\lambda
s , EBEN tests an additional 10 \alpha
s (thus a total of 200 pair of
hyperparameters), and EBlasso-NEG tests up to 25 pairs of (a,b).
Usage
cv.EBglmnet(x, y, family=c("gaussian","binomial"),
prior= c("lassoNEG","lasso","elastic net"), nfolds=5,
foldId, verbose = 0)
Arguments
x |
input matrix of dimension |
y |
response variable. Continuous for |
family |
model type taking values of "gaussian" (default) or "binomial". |
prior |
prior distribution to be used. Taking values of "lassoNEG"(default), "lasso", and "elastic net". All priors will produce a sparse outcome of the regression coefficients; see Details for choosing priors. |
nfolds |
number of n-fold CV. |
foldId |
an optional vector of values between 1 and |
verbose |
parameter that controls the level of message output from EBglment. It takes values from 0 to 5; larger verbose displays more messages. 0 is recommended for CV to avoid excessive outputs. Default value for |
Details
The three priors in EBglmnet all contain hyperparameters that control how heavy the tail probabilities are. Different values of the hyperparameters will yield different number of non-zero effects retained in the model.
Appropriate selection of their values is required to obtain optimal results, and CV is the most
oftenly used method. For Gaussian model, CV determines the optimal hyperparameter values that yield
the minimum square error. In Binomial model, CV calculates the mean logLikelihood in each of
the left out fold, and chooses the values that yield the maximum mean logLikelihood value of the k-folds.
See EBglmnet
for the details of hyperparameters in each prior distribution.
Value
CrossValidation |
matrix of CV result with columns of: Prediction metrics is the mean square error (MSE) for Gaussian model and mean log likelihood (logL) for the binomial model. |
optimal hyperparameter |
the hyperparameters that yield the smallest MSE or the largest logL. |
fit |
model fit using the optimal parameters computed by CV. See |
WaldScore |
the Wald Score for the posterior distribution. See (Huang A., Martin E., et al., 2014b) for using Wald Score to identify significant effect set. |
Intercept |
model intercept. This parameter is not shrunk (assumes uniform prior). |
residual variance |
the residual variance if the Gaussian family is assumed in the GLM |
logLikelihood |
the log Likelihood if the Binomial family is assumed in the GLM |
hyperparameters |
the hyperparameter(s) used to fit the model |
family |
the GLM family specified in this function call |
prior |
the prior used in this function call |
call |
the call that produced this object |
nobs |
number of observations |
nfolds |
number of folds in CV |
Author(s)
Anhui Huang and Dianting Liu
Dept of Electrical and Computer Engineering, Univ of Miami, Coral Gables, FL
References
Cai, X., Huang, A., and Xu, S. (2011). Fast empirical Bayesian LASSO for multiple quantitative trait locus mapping. BMC Bioinformatics 12, 211.
Huang A, Xu S, Cai X. (2013). Empirical Bayesian LASSO-logistic regression for multiple binary trait locus mapping. BMC genetics 14(1):5.
Huang, A., Xu, S., and Cai, X. (2014a). Empirical Bayesian elastic net for multiple quantitative trait locus mapping. Heredity 10.1038/hdy.2014.79
uang, A., E. Martin, et al. (2014b). Detecting genetic interactions in pathway-based genome-wide association studies. Genet Epidemiol 38(4): 300-309.
Examples
rm(list = ls())
library(EBglmnet)
#Use R built-in data set state.x77
y= state.x77[,"Life Exp"]
xNames = c("Population","Income","Illiteracy", "Murder","HS Grad","Frost","Area")
x = state.x77[,xNames]
#
#Gaussian Model
#lassoNEG prior as default
out = cv.EBglmnet(x,y)
out$fit
#lasso prior
out = cv.EBglmnet(x,y,prior= "lasso")
out$fit
#elastic net prior
out = cv.EBglmnet(x,y,prior= "elastic net")
out$fit
#
#Binomial Model
#create a binary response variable
yy = y>mean(y);
out = cv.EBglmnet(x,yy,family="binomial")
out$fit