gamsel {gamsel}R Documentation

Fit Regularization Path for Gaussian or Binomial Generalized Additive Model

Description

Using overlap grouped lasso penalties, gamsel selects whether a term in a gam is nonzero, linear, or a non-linear spline (up to a specified max df per variable). It fits the entire regularization path on a grid of values for the overall penalty lambda, both for gaussian and binomial families.

Usage

gamsel(
  x,
  y,
  num_lambda = 50,
  lambda = NULL,
  family = c("gaussian", "binomial"),
  degrees = rep(10, p),
  gamma = 0.4,
  dfs = rep(5, p),
  bases = pseudo.bases(x, degrees, dfs, parallel = parallel, ...),
  tol = 1e-04,
  max_iter = 2000,
  traceit = FALSE,
  parallel = FALSE,
  ...
)

Arguments

x

Input (predictor) matrix of dimension nobs x nvars. Each observation is a row.

y

Response variable. Quantitative for family="gaussian" and with values in {0,1} for family="binomial"

num_lambda

Number of lambda values to use. (Length of lambda sequence.)

lambda

User-supplied lambda sequence. For best performance, leave as NULL and allow the routine to automatically select lambda. Otherwise, supply a (preferably gradually) decreasing sequence.

family

Response type. "gaussian" for linear model (default). "binomial" for logistic model.

degrees

An integer vector of length nvars specifying the maximum number of spline basis functions to use for each variable.

gamma

Penalty mixing parameter 0 \le\gamma\le 1. Values \gamma < 0.5 penalize linear fit less than non-linear fit. The default is \gamma = 0.4, which encourages a linear term over a nonlinear term.

dfs

Numeric vector of length nvars specifying the maximum (end-of-path) degrees of freedom for each variable.

bases

A list of orthonormal bases for the non-linear terms for each variable. The function pseudo.bases generates these, using the parameters dfs and degrees. See the documentation for pseudo.bases.

tol

Convergence threshold for coordinate descent. The coordinate descent loop continues until the total change in objective after a pass over all variables is less than tol. Default is 1e-4.

max_iter

Maximum number of coordinate descent iterations over all the variables for each lambda value. Default is 2000.

traceit

If TRUE, various information is printed during the fitting process.

parallel

passed on to the pseudo.bases() function. Uses multiple process if available.

...

additional arguments passed on to pseudo.bases()

Details

The sequence of models along the lambda path is fit by (block) cordinate descent. In the case of logistic regression the fitting routine may terminate before all num_lambda values of lambda have been used. This occurs when the fraction of null deviance explained by the model gets too close to 1, at which point the fit becomes numerically unstable. Each of the smooth terms is computed using an approximation to the Demmler-Reinsch smoothing spline basis for that variable, and the accompanying diagonal pernalty matrix.

Value

An object with S3 class gamsel. %% If it is a LIST, use

intercept

Intercept sequence of length num_lambda

alphas

nvars x num_lambda matrix of linear coefficient estimates

betas

sum(degrees) x num_lambda matrix of non-linear coefficient estimates

lambdas

The sequence of lambda values used

degrees

Number of basis functions used for each variable

parms

A set of parameters that capture the bases used. This allows for efficient generation of the bases elements for predict.gamsel

, the predict method for this class.

family

"gaussian" or "binomial"

nulldev

Null deviance (deviance of the intercept model)

dev.ratio

Vector of length num_lambda giving fraction of (null) deviance explained by each model along the lambda sequence

call

The call that produced this object

%% ...

Author(s)

Alexandra Chouldechova and Trevor Hastie
Maintainer: Trevor Hastie hastie@stanford.edu

References

Chouldechova, A. and Hastie, T. (2015) Generalized Additive Model Selection, https://arxiv.org/abs/1506.03850

See Also

predict.gamsel, cv.gamsel, plot.gamsel, summary.gamsel, basis.gen,

Examples


##data=gamsel:::gendata(n=500,p=12,k.lin=3,k.nonlin=3,deg=8,sigma=0.5)
data = readRDS(system.file("extdata/gamsel_example.RDS", package = "gamsel"))
attach(data)
bases=pseudo.bases(X,degree=10,df=6)
# Gaussian gam
gamsel.out=gamsel(X,y,bases=bases)
par(mfrow=c(1,2),mar=c(5,4,3,1))
summary(gamsel.out)
gamsel.cv=cv.gamsel(X,y,bases=bases)
par(mfrow=c(1,1))
plot(gamsel.cv)
par(mfrow=c(3,4))
plot(gamsel.out,newx=X,index=20)
# Binomial model
gamsel.out=gamsel(X,yb,family="binomial")
par(mfrow=c(1,2),mar=c(5,4,3,1))
summary(gamsel.out)
par(mfrow=c(3,4))
plot(gamsel.out,newx=X,index=30)


[Package gamsel version 1.8-4 Index]