sox {sox}R Documentation

(Time-dependent) Cox model with structured variable selection

Description

Fit a (time-dependent) Cox model with overlapping (including nested) group lasso penalty. The regularization path is computed at a grid of values for the regularization parameter lambda.

Usage

sox(
  x,
  ID,
  time,
  time2,
  event,
  penalty,
  lambda,
  group,
  group_variable,
  own_variable,
  no_own_variable,
  penalty_weights,
  par_init,
  stepsize_init = 1,
  stepsize_shrink = 0.8,
  tol = 1e-05,
  maxit = 1000L,
  verbose = FALSE
)

Arguments

x

Predictor matrix with dimension nm * p, where n is the number of subjects, m is the maximum observation time, and p is the number of predictors. See Details.

ID

The ID of each subjects, each subject has one ID (multiple rows in x can share one ID).

time

Represents the start of each time interval.

time2

Represents the stop of each time interval.

event

Indicator of event. event = 1 when event occurs and event = 0 otherwise.

penalty

Character string, indicating whether "overlapping" or "nested" group lasso penalty is imposed.

lambda

Sequence of regularization coefficients \lambda's.

group

A G * G integer matrix required to describe the structure of the overlapping and nested groups. We recommend that the users generate it automatically using overlap_structure() and nested_structure(). See Examples and Details.

group_variable

A p * G integer matrix required to describe the structure of the overlapping groups. We recommend that the users generate it automatically using overlap_structure(). See Examples and Details.

own_variable

A non-decreasing integer vector of length G required to describe the structure of the nested groups. We recommend that the users generate it automatically using nested_structure(). See Examples and Details.

no_own_variable

An integer vector of length G required to describe the structure of the nested groups. We recommend that the users generate it automatically using nested_structure(). See Examples and Details

penalty_weights

Optional, vector of length G specifying the group-specific penalty weights. We recommend that the users generate it automatically using overlap_structure() or nested_structure(). If not specified, \mathbf{1}_G is used.

par_init

Optional, vector of initial values of the optimization algorithm. Default initial value is zero for all p variables.

stepsize_init

Initial value of the stepsize of the optimization algorithm. Default is 1.0.

stepsize_shrink

Factor in (0,1) by which the stepsize shrinks in the backtracking linesearch. Default is 0.8.

tol

Convergence criterion. Algorithm stops when the l_2 norm of the difference between two consecutive updates is smaller than tol.

maxit

Maximum number of iterations allowed.

verbose

Logical, whether progress is printed.

Details

The predictor matrix should be of dimension nm * p. Each row records the values of covariates for one subject at one time, for example, the values at the day from time (Start) to time2 (Stop). An example dataset sim is provided. The dataset has the format produced by the R package PermAlgo. The specification of the arguments group, group_variable, own_variable and no_own_variable for the grouping structure can be found in https://thoth.inrialpes.fr/people/mairal/spams/doc-R/html/doc_spams006.html#sec26 and https://thoth.inrialpes.fr/people/mairal/spams/doc-R/html/doc_spams006.html#sec27.

In the Examples below, p=9,G=5, the group structure is:

g_1 = \{A_{1}, A_{2}, A_{1}B, A_{2}B\},

g_2 = \{B, A_{1}B, A_{2}B, C_{1}B, C_{2}B\},

g_3 = \{A_{1}B, A_{2}B\},

g_4 = \{C_1, C_2, C_{1}B, C_{2}B\},

g_5 = \{C_{1}B, C_{2}B\}.

where g_3 is a subset of g_1 and g_2, and g_5 is a subset of g_2 and g_4.

Value

A list with the following three elements.

lambdas

The user-specified regularization coefficients lambda sorted in decreasing order.

estimates

A matrix, with each column corresponding to the coefficient estimates at each \lambda in lambdas.

iterations

A vector of number of iterations it takes to converge at each \lambda in lambdas.

Examples

x <- as.matrix(sim[, c("A1","A2","C1","C2","B","A1B","A2B","C1B","C2B")])
lam.seq <- exp(seq(log(1e0), log(1e-3), length.out = 20))

# Variables:
## 1: A1
## 2: A2
## 3: C1
## 4: C2
## 5: B
## 6: A1B
## 7: A2B
## 8: C1B
## 9: C2B

# Overlapping groups:
## g1: A1, A2, A1B, A2B
## g2: B, A1B, A2B, C1B, C2B
## g3: A1B, A2B
## g4: C1, C2, C1B, C2B
## g5: C1B, C2B

overlapping.groups <- list(c(1, 2, 6, 7),
                           c(5, 6, 7, 8, 9),
                           c(6, 7),
                           c(3, 4, 8, 9),
                           c(8, 9))
                           
pars.overlapping <- overlap_structure(overlapping.groups)

fit.overlapping <- sox(
  x = x,
  ID = sim$Id,
  time = sim$Start,
  time2 = sim$Stop,
  event = sim$Event,
  penalty = "overlapping",
  lambda = lam.seq,
  group = pars.overlapping$groups,
  group_variable = pars.overlapping$groups_var,
  penalty_weights = pars.overlapping$group_weights,
  tol = 1e-4,
  maxit = 1e3,
  verbose = FALSE
)

str(fit.overlapping)

# Nested groups (misspecified, for the demonstration of the software only.)
## g1: A1, A2, C1, C2, B, A1B, A2B, C1B, C2B
## g2: A1B, A2B, A1B, A2B
## g3: C1, C2, C1B, C2B
## g4: 1
## g5: 2
## ...
## G12: 9

nested.groups <- list(1:9,
                      c(1, 2, 6, 7),
                      c(3, 4, 8, 9),
                      1, 2, 3, 4, 5, 6, 7, 8, 9)

pars.nested <- nested_structure(nested.groups)

fit.nested <- sox(
  x = x,
  ID = sim$Id,
  time = sim$Start,
  time2 = sim$Stop,
  event = sim$Event,
  penalty = "nested",
  lambda = lam.seq,
  group = pars.nested$groups,
  own_variable = pars.nested$own_variables,
  no_own_variable = pars.nested$N_own_variables,
  penalty_weights = pars.nested$group_weights,
  tol = 1e-4,
  maxit = 1e3,
  verbose = FALSE
)

str(fit.nested)


[Package sox version 1.2 Index]