R: Fit penalized semiparametric additive hazards model

ahazpen {ahaz}

R Documentation

Fit penalized semiparametric additive hazards model

Description

Fit a semiparametric additive hazards model via penalized estimating equations using, for example, the lasso penalty. The complete regularization path is computed at a grid of values for the penalty parameter lambda via the method of cyclic coordinate descent.

Usage

ahazpen(surv, X, weights,  standardize=TRUE,  penalty=lasso.control(),
        nlambda=100, dfmax=nvars, pmax=min(nvars, 2*dfmax),
        lambda.minf=ifelse(nobs < nvars,0.05, 1e-4), lambda,
        penalty.wgt=NULL, keep=NULL, control=list())

Arguments

`surv`	Response in the form of a survival object, as returned by the function `Surv()` in the package survival. Right-censored and counting process format (left-truncation) is supported. Tied survival times are not supported.
`X`	Design matrix. Missing values are not supported.
`weights`	Optional vector of observation weights. Default is 1 for each observation.
`standardize`	Logical flag for variable standardization, prior to model fitting. Estimates are always returned on the original scale. Default is `standardize=TRUE`.
`penalty`	A description of the penalty function to be used for model fitting. This can be a character string naming a penalty function (currently `"lasso"` or stepwise SCAD, `"sscad"`) or a call to the desired penalty function. See `ahazpen.pen.control` for the available penalty functions and advanced options; see also the examples.
`nlambda`	The number of `lambda` values. Default is `nlambda=100`.
`dfmax`	Limit the maximum number of variables in the model. Unless a complete regularization path is needed, it is highly recommended to initially choose a relatively smaller value of `dfmax` to substantially reduce computation time.
`pmax`	Limit the maximum number of variables to ever be considered by the coordinate descent algorithm.
`lambda.minf`	Smallest value of `lambda`, as a fraction of `lambda.max`, the (data-derived) smallest value of `lambda` for which all regression coefficients are zero. The default depends on the sample size `nobs` relative to the number of variables `nvars`. If `nobs >= nvars`, the default is `0.0001`, close to zero. When `nobs < nvars`, the default is `0.05`.
`lambda`	An optional user supplied sequence of penalty parameters. Typical usage is to have the program compute its own `lambda` sequence based on `nlambda` and `lambda.minf`. A user-specified lambda sequence overrides `dfmax` but not `pmax`.
`penalty.wgt`	A vector of nonnegative penalty weights for each regression coefficient. This is a number that multiplies `lambda` to allow differential penalization. Can be 0 for some variables, which implies no penalization so that the variable is always included in the model; or `Inf` which implies that the variable is never included in the model. Default is 1 for all variables.
`keep`	A vector of indices of variables which should always be included in the model (no penalization). Equivalent to specifying a `penalty.wgt` of 0.
`control`	A list of parameters for controlling the model fitting algorithm. The list is passed to `ahazpen.fit.control`.

Details

Fits the sequence of models implied by the penalty function penalty, the sequence of penalty parameters lambda by using the very efficient method of cyclic coordinate descent.

For data sets with a very large number of covariates, it is recommended to only calculate partial paths by specifying a smallish value of dmax.

The sequence lambda is computed automatically by the algorithm but can also be set (semi)manually by specifying nlambda or lambda. The stability and efficiency of the algorithm is highly dependent on the grid lambda values being reasonably dense, and lambda (and nlambda) should be specified accordingly. In particular, it is not recommended to specify a single or a few lambda values. Instead, a partial regularization path should be calculated and the functions predict.ahazpen or coef.ahazpen should be used to extract coefficient estimates at specific lambda values.

Value

An object with S3 class "ahazpen".

`call`	The call that produced this object
`beta`	An `nvars x length(lambda)` matrix (in sparse column format, class `dgCMatrix`) of penalized regression coefficients.
`lambda`	The sequence of actual `lambda` values used.
`df`	The number of nonzero coefficients for each value of `lambda`.
`nobs`	Number of observations.
`nvars`	Number of covariates.
`surv`	A copy of the argument `survival`.
`npasses`	Total number of passes by the fitting algorithm over the data, for all lambda values.
`penalty.wgt`	The actually used `penalty.wgt`.
`penalty`	An object of class `ahaz.pen.control`, as specified by `penalty`.
`dfmax`	A copy of `dfmax`.
`penalty`	A copy of `pmax`.

References

Gorst-Rasmussen A., Scheike T. H. (2012). Coordinate Descent Methods for the Penalized Semiparametric Additive Hazards Model. Journal of Statistical Software, 47(9):1-17. https://www.jstatsoft.org/v47/i09/

Gorst-Rasmussen, A. & Scheike, T. H. (2011). Independent screening for single-index hazard rate models with ultra-high dimensional features. Technical report R-2011-06, Department of Mathematical Sciences, Aalborg University.

Leng, C. & Ma, S. (2007). Path consistent model selection in additive risk model via Lasso. Statistics in Medicine; 26:3753-3770.

Martinussen, T. & Scheike, T. H. (2008). Covariate selection for the semiparametric additive risk model. Scandinavian Journal of Statistics; 36:602-619.

Zou, H. & Li, R. (2008). One-step sparse estimates in nonconcave penalized likelihood models, Annals of Statistics; 36:1509-1533.

Examples

data(sorlie)

# Break ties
set.seed(10101)
time <- sorlie$time+runif(nrow(sorlie))*1e-2

# Survival data + covariates
surv <- Surv(time,sorlie$status)
X <- as.matrix(sorlie[,3:ncol(sorlie)])

# Fit additive hazards regression model
fit1 <- ahazpen(surv, X,penalty="lasso", dfmax=30)
fit1
plot(fit1)

# Extend the grid to contain exactly 100 lambda values
lrange <- range(fit1$lambda)
fit2 <- ahazpen(surv, X,penalty="lasso", lambda.minf=lrange[1]/lrange[2])
plot(fit2)

# User-specified lambda sequence
lambda <- exp(seq(log(0.30), log(0.1), length = 100))
fit2 <- ahazpen(surv, X, penalty="lasso", lambda = lambda)
plot(fit2)

# Advanced usage - specify details of the penalty function
fit4 <- ahazpen(surv, X,penalty=sscad.control(nsteps=2))
fit4
fit5 <- ahazpen(surv, X,penalty=lasso.control(alpha=0.1))
plot(fit5)

[Package ahaz version 1.15 Index]