R: (Time-dependent) Cox model with structured variable selection

sox {sox}

R Documentation

(Time-dependent) Cox model with structured variable selection

Description

Fit a (time-dependent) Cox model with overlapping (including nested) group lasso penalty. The regularization path is computed at a grid of values for the regularization parameter lambda.

Usage

sox(
  x,
  ID,
  time,
  time2,
  event,
  penalty,
  lambda,
  group,
  group_variable,
  own_variable,
  no_own_variable,
  penalty_weights,
  par_init,
  stepsize_init = 1,
  stepsize_shrink = 0.8,
  tol = 1e-05,
  maxit = 1000L,
  verbose = FALSE
)

Arguments

`x`	Predictor matrix with dimension `nm * p`, where `n` is the number of subjects, `m` is the maximum observation time, and `p` is the number of predictors. See Details.
`ID`	The ID of each subjects, each subject has one ID (multiple rows in `x` can share one `ID`).
`time`	Represents the start of each time interval.
`time2`	Represents the stop of each time interval.
`event`	Indicator of event. `event = 1` when event occurs and `event = 0` otherwise.
`penalty`	Character string, indicating whether "`overlapping`" or "`nested`" group lasso penalty is imposed.
`lambda`	Sequence of regularization coefficients `\lambda`'s.
`group`	A `G * G` integer matrix required to describe the structure of the `overlapping` and `nested` groups. We recommend that the users generate it automatically using `overlap_structure()` and `nested_structure()`. See Examples and Details.
`group_variable`	A `p * G` integer matrix required to describe the structure of the `overlapping` groups. We recommend that the users generate it automatically using `overlap_structure()`. See Examples and Details.
`own_variable`	A non-decreasing integer vector of length `G` required to describe the structure of the `nested` groups. We recommend that the users generate it automatically using `nested_structure()`. See Examples and Details.
`no_own_variable`	An integer vector of length `G` required to describe the structure of the `nested` groups. We recommend that the users generate it automatically using `nested_structure()`. See Examples and Details
`penalty_weights`	Optional, vector of length `G` specifying the group-specific penalty weights. We recommend that the users generate it automatically using `overlap_structure()` or `nested_structure()`. If not specified, `\mathbf{1}_G` is used.
`par_init`	Optional, vector of initial values of the optimization algorithm. Default initial value is zero for all `p` variables.
`stepsize_init`	Initial value of the stepsize of the optimization algorithm. Default is 1.0.
`stepsize_shrink`	Factor in `(0,1)` by which the stepsize shrinks in the backtracking linesearch. Default is 0.8.
`tol`	Convergence criterion. Algorithm stops when the `l_2` norm of the difference between two consecutive updates is smaller than `tol`.
`maxit`	Maximum number of iterations allowed.
`verbose`	Logical, whether progress is printed.

Details

The predictor matrix should be of dimension nm * p. Each row records the values of covariates for one subject at one time, for example, the values at the day from time (Start) to time2 (Stop). An example dataset sim is provided. The dataset has the format produced by the R package PermAlgo. The specification of the arguments group, group_variable, own_variable and no_own_variable for the grouping structure can be found in https://thoth.inrialpes.fr/people/mairal/spams/doc-R/html/doc_spams006.html#sec26 and https://thoth.inrialpes.fr/people/mairal/spams/doc-R/html/doc_spams006.html#sec27.

In the Examples below, p=9,G=5, the group structure is:

g_1 = \{A_{1}, A_{2}, A_{1}B, A_{2}B\},

g_2 = \{B, A_{1}B, A_{2}B, C_{1}B, C_{2}B\},

g_3 = \{A_{1}B, A_{2}B\},

g_4 = \{C_1, C_2, C_{1}B, C_{2}B\},

g_5 = \{C_{1}B, C_{2}B\}.

where g_3 is a subset of g_1 and g_2, and g_5 is a subset of g_2 and g_4.

Value

A list with the following three elements.

`lambdas`	The user-specified regularization coefficients `lambda` sorted in decreasing order.
`estimates`	A matrix, with each column corresponding to the coefficient estimates at each `\lambda` in `lambdas`.
`iterations`	A vector of number of iterations it takes to converge at each `\lambda` in `lambdas`.

Examples

x <- as.matrix(sim[, c("A1","A2","C1","C2","B","A1B","A2B","C1B","C2B")])
lam.seq <- exp(seq(log(1e0), log(1e-3), length.out = 20))

# Variables:
## 1: A1
## 2: A2
## 3: C1
## 4: C2
## 5: B
## 6: A1B
## 7: A2B
## 8: C1B
## 9: C2B

# Overlapping groups:
## g1: A1, A2, A1B, A2B
## g2: B, A1B, A2B, C1B, C2B
## g3: A1B, A2B
## g4: C1, C2, C1B, C2B
## g5: C1B, C2B

overlapping.groups <- list(c(1, 2, 6, 7),
                           c(5, 6, 7, 8, 9),
                           c(6, 7),
                           c(3, 4, 8, 9),
                           c(8, 9))
                           
pars.overlapping <- overlap_structure(overlapping.groups)

fit.overlapping <- sox(
  x = x,
  ID = sim$Id,
  time = sim$Start,
  time2 = sim$Stop,
  event = sim$Event,
  penalty = "overlapping",
  lambda = lam.seq,
  group = pars.overlapping$groups,
  group_variable = pars.overlapping$groups_var,
  penalty_weights = pars.overlapping$group_weights,
  tol = 1e-4,
  maxit = 1e3,
  verbose = FALSE
)

str(fit.overlapping)

# Nested groups (misspecified, for the demonstration of the software only.)
## g1: A1, A2, C1, C2, B, A1B, A2B, C1B, C2B
## g2: A1B, A2B, A1B, A2B
## g3: C1, C2, C1B, C2B
## g4: 1
## g5: 2
## ...
## G12: 9

nested.groups <- list(1:9,
                      c(1, 2, 6, 7),
                      c(3, 4, 8, 9),
                      1, 2, 3, 4, 5, 6, 7, 8, 9)

pars.nested <- nested_structure(nested.groups)

fit.nested <- sox(
  x = x,
  ID = sim$Id,
  time = sim$Start,
  time2 = sim$Stop,
  event = sim$Event,
  penalty = "nested",
  lambda = lam.seq,
  group = pars.nested$groups,
  own_variable = pars.nested$own_variables,
  no_own_variable = pars.nested$N_own_variables,
  penalty_weights = pars.nested$group_weights,
  tol = 1e-4,
  maxit = 1e3,
  verbose = FALSE
)

str(fit.nested)

[Package sox version 1.2 Index]