cv_nb_grpreg {SSGL} | R Documentation |
Cross-validation for Group-Regularized Negative Binomial Regression
Description
This function implements K
-fold cross-validation for group-regularized negative binomial regression with a known size parameter \alpha
and the log link. The cross-validation error (CVE) and cross-validation standard error (CVSE) are computed using the deviance for negative binomial regression.
For a description of group-regularized negative binomial regression, see the description for the nb_grpreg
function. Our implementation is based on the least squares approximation approach of Wang and Leng (2007), and hence, the function does not allow the total number of covariates p
to be greater than \frac{K-1}{K} \times
sample size, where K
is the number of folds.
Note that the nb_grpreg
function also returns the generalized information criterion (GIC) of Fan and Tang (2013) for each regularization parameter in lambda
, and the GIC can also be used for model selection instead of cross-validation.
Usage
cv_nb_grpreg(Y, X, groups, nb_size=1, penalty=c("gLASSO","gSCAD","gMCP"),
n_folds=10, group_weights, taper, n_lambda=100, lambda,
max_iter=10000, tol=1e-4)
Arguments
Y |
|
X |
|
groups |
|
nb_size |
known size parameter |
penalty |
group regularization method to use on the groups of regression coefficients. The options are |
n_folds |
number of folds |
group_weights |
group-specific, nonnegative weights for the penalty. Default is to use the square roots of the group sizes. |
taper |
tapering term |
n_lambda |
number of regularization parameters |
lambda |
grid of |
max_iter |
maximum number of iterations in the algorithm. Default is |
tol |
convergence threshold for algorithm. Default is |
Value
The function returns a list containing the following components:
lambda |
|
cve |
|
cvse |
|
lambda_min |
The value in |
min_index |
The index of |
References
Breheny, P. and Huang, J. (2015). "Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors." Statistics and Computing, 25:173-187.
Fan, Y. and Tang, C. Y. (2013). "Tuning parameter selection in high dimensional penalized likelihood." Journal of the Royal Statistical Society: Series B (Statistical Methodology), 75:531-552.
Wang, H. and Leng, C. (2007). "Unified LASSO estimation by least squares approximation." Journal of the American Statistical Association, 102:1039-1048.
Yuan, M. and Lin, Y. (2006). "Model selection and estimation in regression with grouped variables." Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68:49-67.
Examples
## Generate data
set.seed(1234)
X = matrix(runif(100*14), nrow=100)
n = dim(X)[1]
groups = c(1,1,1,2,2,2,2,3,3,4,5,5,6,6)
beta_true = c(-1,1,1,0,0,0,0,-1,1,0,0,0,-1.5,1.5)
## Generate count responses from negative binomial regression
eta = crossprod(t(X), beta_true)
Y = rnbinom(n, size=1, mu=exp(eta))
## 10-fold cross-validation for group-regularized negative binomial
## regression with the group MCP penalty
nb_cv = cv_nb_grpreg(Y, X, groups, penalty="gMCP")
## Plot cross-validation curve
plot(nb_cv$lambda, nb_cv$cve, type="l", xlab="lambda", ylab="CVE")
## lambda which minimizes mean CVE
nb_cv$lambda_min
## index of lambda_min in lambda
nb_cv$min_index