R: fit a GLM with lasso (or elastic net), snet or mnet...

glmreg {mpath}

R Documentation

fit a GLM with lasso (or elastic net), snet or mnet regularization

Description

Fit a generalized linear model via penalized maximum likelihood. The regularization path is computed for the lasso (or elastic net penalty), scad (or snet) and mcp (or mnet penalty), at a grid of values for the regularization parameter lambda. Fits linear, logistic, Poisson and negative binomial (fixed scale parameter) regression models.

Usage

## S3 method for class 'formula'
glmreg(formula, data, weights, offset=NULL, contrasts=NULL, 
x.keep=FALSE, y.keep=TRUE, ...)
## S3 method for class 'matrix'
glmreg(x, y, weights, offset=NULL, ...)
## Default S3 method:
glmreg(x,  ...)

Arguments

`formula`	symbolic description of the model, see details.
`data`	argument controlling formula processing via `model.frame`.
`weights`	optional numeric vector of weights. If `standardize=TRUE`, weights are renormalized to weights/sum(weights). If `standardize=FALSE`, weights are kept as original input
`offset`	this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be NULL or a numeric vector of length equal to the number of cases. Currently only one offset term can be included in the formula.
`x`	input matrix, of dimension nobs x nvars; each row is an observation vector
`y`	response variable. Quantitative for `family="gaussian"`. Non-negative counts for `family="poisson"` or `family="negbin"`. For `family="binomial"` should be either a factor with two levels or a vector of proportions.
`x.keep`, `y.keep`	logical values: keep response variables or keep response variable?
`contrasts`	the contrasts corresponding to `levels` from the respective models
`...`	Other arguments passing to `glmreg_fit`

Details

The sequence of models implied by lambda is fit by coordinate descent. For family="gaussian" this is the lasso, mcp or scad sequence if alpha=1, else it is the enet, mnet or snet sequence. For the other families, this is a lasso (mcp, scad) or elastic net (mnet, snet) regularization path for fitting the generalized linear regression paths, by maximizing the appropriate penalized log-likelihood. Note that the objective function for "gaussian" is

1/2* weights*RSS + \lambda*penalty,

if standardize=FALSE and

1/2* \frac{weights}{\sum(weights)}*RSS + \lambda*penalty,

if standardize=TRUE. For the other models it is

-\sum (weights * loglik) + \lambda*penalty

if standardize=FALSE and

-\frac{weights}{\sum(weights)} * loglik + \lambda*penalty

if standardize=TRUE.

Value

An object with S3 class "glmreg" for the various types of models.

`call`	the call that produced this object
`b0`	Intercept sequence of length `length(lambda)`
`beta`	A `nvars x length(lambda)` matrix of coefficients.
`lambda`	The actual sequence of `lambda` values used
`offset`	the offset vector used.
`resdev`	The computed deviance (for `"gaussian"`, this is the R-square). The deviance calculations incorporate weights if present in the model. The deviance is defined to be 2*(loglike_sat - loglike), where loglike_sat is the log-likelihood for the saturated model (a model with a free parameter per observation).
`nulldev`	Null deviance (per observation). This is defined to be 2*(loglike_sat -loglike(Null)); The NULL model refers to the intercept model.
`nobs`	number of observations
`pll`	penalized log-likelihood values for standardized coefficients in the IRLS iterations. For `family="gaussian"`, not implemented yet.
`pllres`	penalized log-likelihood value for the estimated model on the original scale of coefficients
`fitted.values`	the fitted mean values, obtained by transforming the linear predictors by the inverse of the link function.

Author(s)

Zhu Wang <zwang145@uthsc.edu>

References

Breheny, P. and Huang, J. (2011) Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann. Appl. Statist., 5: 232-253.

Zhu Wang, Shuangge Ma, Michael Zappitelli, Chirag Parikh, Ching-Yun Wang and Prasad Devarajan (2014) Penalized Count Data Regression with Application to Hospital Stay after Pediatric Cardiac Surgery, Statistical Methods in Medical Research. 2014 Apr 17. [Epub ahead of print]

Examples

#binomial
x=matrix(rnorm(100*20),100,20)
g2=sample(0:1,100,replace=TRUE)
fit2=glmreg(x,g2,family="binomial")
#poisson and negative binomial
data("bioChemists", package = "pscl")
fm_pois <- glmreg(art ~ ., data = bioChemists, family = "poisson")
coef(fm_pois)
fm_nb1 <- glmreg(art ~ ., data = bioChemists, family = "negbin", theta=1)
coef(fm_nb1)
#offset
x <- matrix(rnorm(100*20),100,20)
y <- rpois(100, lambda=1)
exposure <- rep(0.5, length(y))
fit2 <- glmreg(x,y, lambda=NULL, nlambda=10, lambda.min.ratio=1e-4, 
	       offset=log(exposure), family="poisson")
predict(fit2, newx=x, newoffset=log(exposure))
## Not run: 
fm_nb2 <- glmregNB(art ~ ., data = bioChemists)
coef(fm_nb2)

## End(Not run)

[Package mpath version 0.4-2.26 Index]