R: The Induced Smoothed lasso path

islasso.path {islasso}

R Documentation

The Induced Smoothed lasso path

Description

islasso.path is used to fit a generalized linear model via induced smoothed lasso method. The regularization path is computed for the lasso or elasticnet penalty at a grid of values for the regularization parameter lambda. Fits linear, logistic, poisson and gamma regression models.

Usage

islasso.path(formula, family = gaussian(), lambda = NULL, nlambda = 100, 
        lambda.min.ratio = ifelse(nobs < nvars, 1E-2, 1E-03), alpha = 1, data, 
        weights, subset, offset, contrasts = NULL, unpenalized, control = is.control())

Arguments

`formula`	an object of class “formula” (or one that can be coerced to that class): the ‘usual’ symbolic description of the model to be fitted.
`family`	the assumed response distribution. Gaussian, (quasi) Binomial, (quasi) Poisson, and Gamma are allowed. `family=gaussian` is implemented with `identity` link, `family=binomial` is implemented with `logit` or `probit` links, `family=poisson` is implemented with `log` link, and `family=Gamma` is implemented with `inverse`, `log` and `identity` links.
`lambda`	A user supplied lambda sequence. Typical usage is to have the program compute its own lambda sequence based on nlambda and lambda.min.ratio. Supplying a value of lambda overrides this.
`nlambda`	The number of lambda values - default is 100.
`lambda.min.ratio`	Smallest value for lambda, as a fraction of lambda.max, the (data derived) entry value (i.e. the smallest value for which all coefficients are zero). The default depends on the sample size `nobs` relative to the number of variables `nvars`. If `nobs > nvars`, the default is 0.00001, close to zero. If `nobs < nvars`, the default is 0.001. A very small value of lambda.min.ratio will lead to a saturated fit in the `nobs < nvars` case.
`alpha`	The elastic-net mixing parameter, with `0\le\alpha\le 1`. The penalty is defined as `(1-\alpha)/2\|\|\beta\|\|_2^2+\alpha\|\|\beta\|\|_1.` `alpha=1` is the lasso penalty, and `alpha=0` the ridge penalty.
`data`	an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which `islasso` is called.
`weights`	observation weights. Default is 1 for each observation.
`subset`	an optional vector specifying a subset of observations to be used in the fitting process.
`offset`	this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be NULL or a numeric vector of length equal to the number of cases.
`contrasts`	an optional list. See the contrasts.arg of `model.matrix.default`.
`control`	a list of parameters for controlling the fitting process (see `islasso.control` for more details).
`unpenalized`	optional. A vector of integers or characters indicating any covariate (in the formula) with coefficients not to be penalized. The intercept, if included in the model, is always unpenalized.

Details

The sequence of models implied by lambda is fit the islasso method. islasso estimates regression models by imposing a lasso-type penalty on some or all regression coefficients. However the nonsmooth L_1 norm penalty is replaced by a smooth approximation justified under the induced smoothing paradigm. The advantage is that reliable standard errors are returned as model output and hypothesis testing on linear combinantions of the regression parameters can be carried out straightforwardly via the Wald statistic. Simulation studies provide evidence that the proposed approach controls type-I errors and exhibits good power in different scenarios.

Value

A list of

`call`	the matched call.
`Info`	a named matrix containing information about lambda values, estimated degrees of freedom, estimated dispersion parameters, deviance, loglikelhood, number of iterations and convergence criteria.
`GoF`	a named matrix containing information criteria, i.e., AIC, BIC, AICc, eBIC, GCV, GIC.
`Coef`	a `length(lambda) x nvars` matrix of coefficients.
`SE`	a `length(lambda) x nvars` matrix of standard errors.
`Weights`	a `length(lambda) x nvars` matrix of the weight of the mixture in the induced smoothed lasso.
`Linear.predictors`	a `length(lambda) x nvars` matrix of linear predictors
`Fitted.values`	a `length(lambda) x nvars` matrix of fitted values
`Residuals`	a `length(lambda) x nvars` matrix of working residuals
`Input`	a named list containing several input arguments, i.e., the numbers of observations and predictors, if an intercept ha to be estimated, the model matrix and the response vector, the observation weights, the offset, the family object used, The elasticnet mixing parameter and the vector used to specify the unpenalized estimators.
`control`	the value of the control argument used.
`formula`	the formula supplied.
`model`	if requested (the default), the model frame used.
`terms`	the terms object used.
`data`	he data argument.
`xlevels`	(where relevant) a record of the levels of the factors used in fitting.
`contrasts`	(only where relevant) the contrasts used.

Author(s)

Maintainer: Gianluca Sottile <gianluca.sottile@unipa.it>

References

Cilluffo, G, Sottile, G, S, La Grutta, S and Muggeo, VMR (2019). The Induced Smoothed lasso: A practical framework for hypothesis testing in high dimensional regression. Statistical Methods in Medical Research, DOI: 10.1177/0962280219842890.

Sottile, G, Cilluffo, G, Muggeo, VMR (2019). The R package islasso: estimation and hypothesis testing in lasso regression. Technical Report on ResearchGate. doi:10.13140/RG.2.2.16360.11521.

Examples


set.seed(1)
n <- 100
p <- 30
p1 <- 10  #number of nonzero coefficients
coef.veri <- sort(round(c(seq(.5, 3, l=p1/2), seq(-1, -2, l=p1/2)), 2))
sigma <- 1

coef <- c(coef.veri, rep(0, p-p1))

X <- matrix(rnorm(n*p), n, p)
eta <- drop(X%*%coef)

##### gaussian ######
mu <- eta
y <- mu + rnorm(n, 0, sigma)

o <- islasso.path(y ~ ., data = data.frame(y = y, X), 
             family = gaussian(), nlambda = 30L)
o
summary(o, lambda = 10)
coef(o, lambda = 10)
fitted(o, lambda = 10)
predict(o, type="response", lambda = 10)
plot(o, xvar = "coef")
residuals(o, lambda = 10)
deviance(o, lambda = 10)
logLik(o, lambda = 10)
GoF.islasso.path(o)

## Not run: 
##### binomial ######
coef <- c(c(1,1,1), rep(0, p-3))
X <- matrix(rnorm(n*p), n, p)
eta <- drop(cbind(1, X)%*%c(-1, coef))
mu <- binomial()$linkinv(eta)
y <- rbinom(n, 100, mu)
y <- cbind(y, 100-y)

o <- islasso.path(cbind(y1, y2) ~ ., 
             data = data.frame(y1 = y[,1], y2 = y[,2], X), 
             family = binomial(), nlambda = 30L)
temp <- GoF.islasso.path(o)
summary(o, pval = .05, lambda = temp$lambda.min["BIC"])

##### poisson ######
coef <- c(c(1,1,1), rep(0, p-3))
X <- matrix(rnorm(n*p), n, p)
eta <- drop(cbind(1, X)%*%c(1, coef))
mu <- poisson()$linkinv(eta)
y <- rpois(n, mu)

o <- islasso.path(y ~ ., data = data.frame(y = y, X), 
             family = poisson(), nlambda = 30L)
temp <- GoF.islasso.path(o)
summary(o, pval = .05, lambda = temp$lambda.min["BIC"])

##### Gamma ######
coef <- c(c(1,1,1), rep(0, p-3))
X <- matrix(rnorm(n*p), n, p)
eta <- drop(cbind(1, X)%*%c(-1, coef))
mu <- Gamma(link="log")$linkinv(eta)
shape <- 10
phi <- 1 / shape
y <- rgamma(n, scale = mu / shape, shape = shape)

o <- islasso.path(y ~ ., data = data.frame(y = y, X), 
             family = Gamma(link = "log"), nlambda = 30L)
temp <- GoF.islasso.path(o)
summary(o, pval = .05, lambda = temp$lambda.min["BIC"])

## End(Not run)

[Package islasso version 1.5.2 Index]