R: fit a negative binomial model with lasso (or elastic net),...

glmregNB {mpath}

R Documentation

fit a negative binomial model with lasso (or elastic net), snet and mnet regularization

Description

Fit a negative binomial linear model via penalized maximum likelihood. The regularization path is computed for the lasso (or elastic net penalty), snet and mnet penalty, at a grid of values for the regularization parameter lambda.

Usage

glmregNB(formula, data, weights, offset=NULL, nlambda = 100, lambda=NULL, 
         lambda.min.ratio = ifelse(nobs<nvars,0.05,0.001), alpha=1, gamma=3, 
         rescale=TRUE, standardize = TRUE, penalty.factor = rep(1, nvars), 
         thresh = 0.001, maxit.theta = 10, maxit=1000, eps=.Machine$double.eps,
         trace=FALSE, start = NULL, etastart = NULL, mustart = NULL, 
         theta.fixed=FALSE, theta0=NULL, init.theta=NULL, link=log, 
         penalty=c("enet","mnet","snet"), method="glmreg_fit", model=TRUE, 
         x.keep=FALSE, y.keep=TRUE, contrasts=NULL, convex=FALSE, 
         parallel=TRUE, n.cores=2, ...)

Arguments

`formula`	formula used to describe a model.
`data`	argument controlling formula processing via `model.frame`.
`weights`	an optional vector of ‘prior weights’ to be used in the fitting process. Should be `NULL` or a numeric vector. Default is a vector of 1s with equal weight for each observation.
`offset`	optional numeric vector with an a priori known component to be included in the linear predictor of the model.
`nlambda`	The number of `lambda` values - default is 100.
`lambda`	A user supplied `lambda` sequence
`lambda.min.ratio`	Smallest value for `lambda`, as a fraction of `lambda.max`, the (data derived) entry value (i.e. the smallest value for which all coefficients are zero). The default depends on the sample size `nobs` relative to the number of variables `nvars`. If `nobs > nvars`, the default is `0.001`, close to zero. If `nobs < nvars`, the default is `0.05`.
`alpha`	The L2 penalty mixing parameter, with `0\le\alpha\le 1`. `alpha=1` is lasso (mcp, scad) penalty; and `alpha=0` the ridge penalty.
`gamma`	The tuning parameter of the `snet` or `mnet` penalty.
`rescale`	logical value, if TRUE, adaptive rescaling of the penalty parameter for `penalty="mnet"` or `penalty="snet"` with `family` other than "gaussian". See reference
`standardize`	Logical flag for x variable standardization, prior to fitting the model sequence. The coefficients are always returned on the original scale. Default is `standardize=TRUE`. If variables are in the same units already, you might not wish to standardize.
`penalty.factor`	This is a number that multiplies `lambda` to allow differential shrinkage of coefficients. Can be 0 for some variables, which implies no shrinkage, and that variable is always included in the model. Default is same shrinkage for all variables.
`thresh`	Convergence threshold for coordinate descent. Defaults value is `1e-6`.
`maxit.theta`	Maximum number of iterations for estimating `theta` scaling parameter
`maxit`	Maximum number of coordinate descent iterations for each `lambda` value; default is 1000.
`eps`	If a number is less than `eps` in magnitude, then this number is considered as 0
`trace`	If `TRUE`, fitting progress is reported
`start`, `etastart`, `mustart`, `...`	arguments for the `link{glmreg}` function
`init.theta`	initial scaling parameter `theta`
`theta.fixed`	Estimate scale parameter theta? Default is FALSE. Note, the algorithm may become slow. In this case, one may use `glmreg` function with `family="negbin"`, and a fixed `theta`

`theta0`	initial scale parameter vector theta, with length `nlambda` if `theta.fixed=TRUE`. Default is NULL
`convex`	Calculate index for which objective function ceases to be locally convex? Default is FALSE and only useful if `penalty="mnet" or "snet"`.
`link`	link function, default is `log`
`penalty`	Type of regularization
`method`	estimation method
`model`, `x.keep`, `y.keep`	logicals. If `TRUE` the corresponding components of the fit (model frame, response, model matrix) are returned.
`contrasts`	the contrasts corresponding to `levels` from the respective models
`parallel`, `n.cores`	a logical value, parallel computing or not for sequence of `lambda` with the number of CPU cores to use. The `lambda` loop will attempt to send different `lambda` off to different cores.

Details

The sequence of models implied by lambda is fit by coordinate descent. This is a lasso (mcp, scad) or elastic net (mnet, snet) regularization path for fitting the negative binomial linear regression paths, by maximizing the penalized log-likelihood. Note that the objective function is

-\sum (weights * loglik) + \lambda*penalty

if standardize=FALSE and

-\frac{weights}{\sum(weights)} * loglik + \lambda*penalty

if standardize=TRUE.

Value

An object with S3 class "glmreg", "glmregNB" for the various types of models.

`call`	the call that produced the model fit
`b0`	Intercept sequence of length `length(lambda)`
`beta`	A `nvars x length(lambda)` matrix of coefficients.
`lambda`	The actual sequence of `lambda` values used
`resdev`	The computed deviance. The deviance calculations incorporate weights if present in the model. The deviance is defined to be 2*(loglike_sat - loglike), where loglike_sat is the log-likelihood for the saturated model (a model with a free parameter per observation).
`nulldev`	Null deviance (per observation). This is defined to be 2*(loglike_sat -loglike(Null)); The NULL model refers to the intercept model.
`nobs`	number of observations

Author(s)

Zhu Wang <zwang145@uthsc.edu>

References

Zhu Wang, Shuangge Ma, Michael Zappitelli, Chirag Parikh, Ching-Yun Wang and Prasad Devarajan (2014) Penalized Count Data Regression with Application to Hospital Stay after Pediatric Cardiac Surgery, Statistical Methods in Medical Research. 2014 Apr 17. [Epub ahead of print]

Examples

## Not run: 
data("bioChemists", package = "pscl")
system.time(fm_nb1 <- glmregNB(art ~ ., data = bioChemists, parallel=FALSE))
system.time(fm_nb2 <- glmregNB(art ~ ., data = bioChemists, parallel=TRUE, n.cores=2))
coef(fm_nb1)
### ridge regression
fm <- glmregNB(art ~ ., alpha=0, data = bioChemists, lambda=seq(0.001, 1, by=0.01))
fm <- cv.glmregNB(art ~ ., alpha=0, data = bioChemists, lambda=seq(0.001, 1, by=0.01))

## End(Not run)

[Package mpath version 0.4-2.26 Index]