R: cross-validation for 'netcox'

netcox_cv {netcox}

R Documentation

cross-validation for `netcox`

Description

Conduct cross-validation (cv) for netcox.

Usage

netcox_cv(
  x,
  ID,
  time,
  time2,
  event,
  lambda,
  group,
  group_variable,
  penalty_weights,
  par_init,
  nfolds = 10,
  stepsize_init = 1,
  stepsize_shrink = 0.8,
  tol = 1e-05,
  maxit = 1000L,
  verbose = FALSE
)

Arguments

`x`	Predictor matrix with dimension `nm * p`, where `n` is the number of subjects, `m` is the maximum observation time, and `p` is the number of predictors. See Details.
`ID`	The ID of each subjects, each subject has one ID (many rows in `x` share one `ID`).
`time`	Represents the start of each time interval.
`time2`	Represents the stop of each time interval.
`event`	Indicator of event. `event = 1` when event occurs and `event = 0` otherwise.
`lambda`	Sequence of regularization coefficients `\lambda`'s.
`group`	`G * G` matrix describing the relationship between the groups of variables, where `G` represents the number of groups. Denote the `i`-th group of variables by `g_i`. The `(i,j)` entry is `1` if and only if `i\neq j` and `g_i` is a child group (subset) of `g_j`, and is `0` otherwise. See Examples and Details.
`group_variable`	`p * G` matrix describing the relationship between the groups and the variables. The `(i,j)` entry is `1` if and only if variable `i` is in group `g_j`, but not in any child group of `g_j`, and is `0` otherwise. See Examples and Details.
`penalty_weights`	Optional, vector of length `G` specifying the group-specific penalty weights. If not specified, the default value is `\mathbf{1}_G`. Modify with caution.
`par_init`	Optional, vector of initial values of the optimization algorithm. Default initial value is zero for all `p` variables.
`nfolds`	Optional, the folds of cross-validation. Default is 10.
`stepsize_init`	Initial value of the stepsize of the optimization algorithm. Default is 1.
`stepsize_shrink`	Factor in `(0,1)` by which the stepsize shrinks in the backtracking linesearch. Default is 0.8.
`tol`	Convergence criterion. Algorithm stops when the `l_2` norm of the difference between two consecutive updates is smaller than `tol`.
`maxit`	Maximum number of iterations allowed.
`verbose`	Logical, whether progress is printed.

Details

For each lambda, 10 folds cross-validation (by default) is performed. The cv error is defined as follows. Suppose we perform K-fold cross-validation, denote \hat{\beta}^{-k} by the estimate obtained from the rest of K-1 folds (training set). The error of the k-th fold (test set) is defined as 2(P-Q) divided by R, where P is the log partial likelihood evaluated at \hat{\beta}^{-k} using the entire dataset, Q is the log partial likelihood evaluated at \hat{\beta}^{-k} using the training set, and R is the number of events in the test set. We do not use the negative log partial likelihood evaluated at \hat{\beta}^{-k} using the test set because the former definition can efficiently use the risk set, and thus it is more stable when the number of events in each test set is small (think of leave-one-out). The cv error is used in parameter tuning. To account for balance in outcomes among the randomly formed test set, we divide the deviance 2(P-Q) by R. To get the estimated coefficients that has the minimum cv error, use netcox()$Estimates[netcox()$Lambdas==netcox_cv()$lambda.min]. To apply the 1-se rule, use netcox()$Estimates[netcox()$Lambdas==netcox_cv()$lambda.1se].

Value

A list.

`lambdas`	A vector of lambda used for each cross-validation.
`cvm`	The cv error averaged across all folds for each lambda.
`cvsd`	The standard error of the cv error for each lambda.
`cvup`	The cv error plus its standard error for each lambda.
`cvlo`	The cv error minus its standard error for each lambda.
`nzero`	The number of non-zero coefficients at each lambda.
`netcox.fit`	A netcox fit for the full data at all lambdas.
`lambda.min`	The lambda such that the `cvm` reach its minimum.
`lambda.1se`	The maximum of lambda such that the `cvm` is less than the minimum the `cvup` (the minmum of `cvm` plus its standard error).

Examples

grp <- matrix(c(0, 0, 0, 0, 0,
                0, 0, 0, 0, 0,
                1, 1, 0, 0, 0,
                0, 0, 0, 0, 0,
                0, 1, 0, 1, 0),
              ncol = 5, byrow = TRUE)
grp.var <- matrix(c(1, 0, 0, 0, 0, #A1
                    1, 0, 0, 0, 0, #A2
                    0, 0, 0, 1, 0, #C1
                    0, 0, 0, 1, 0, #C2
                    0, 1, 0, 0, 0, #B
                    0, 0, 1, 0, 0, #A1B
                    0, 0, 1, 0, 0, #A2B
                    0, 0, 0, 0, 1, #C1B
                    0, 0, 0, 0, 1  #C2B
                   ), ncol = 5, byrow = TRUE)
eta_g <- rep(1, 5)
x <- as.matrix(sim[, c("A1","A2","C1","C2","B",
                       "A1B","A2B","C1B","C2B")])
lam.seq <- 10^seq(0, -2, by = -0.2)

cv <- netcox_cv(x = x,
                ID = sim$Id,
                time = sim$Start,
                time2 = sim$Stop,
                event = sim$Event,
                lambda = lam.seq,
                group = grp,
                group_variable = grp.var,
                penalty_weights = eta_g,
                nfolds = 5,
                tol = 1e-4,
                maxit = 1e3,
                verbose = FALSE)

[Package netcox version 1.0.1 Index]

cross-validation for netcox