sox_cv {sox} | R Documentation |
cross-validation for sox
Description
Conduct cross-validation (cv) for sox
.
Usage
sox_cv(
x,
ID,
time,
time2,
event,
penalty,
lambda,
group,
group_variable,
own_variable,
no_own_variable,
penalty_weights,
par_init,
nfolds = 10,
foldid = NULL,
stepsize_init = 1,
stepsize_shrink = 0.8,
tol = 1e-05,
maxit = 1000L,
verbose = FALSE
)
Arguments
x |
Predictor matrix with dimension |
ID |
The ID of each subjects, each subject has one ID (multiple rows in |
time |
Represents the start of each time interval. |
time2 |
Represents the stop of each time interval. |
event |
Indicator of event. |
penalty |
Character string, indicating whether " |
lambda |
Sequence of regularization coefficients |
group |
A |
group_variable |
A |
own_variable |
A non-decreasing integer vector of length |
no_own_variable |
An integer vector of length |
penalty_weights |
Optional, vector of length |
par_init |
Optional, vector of initial values of the optimization algorithm. Default initial value is zero for all |
nfolds |
Optional, the folds of cross-validation. Default is 10. |
foldid |
Optional, user-specified vector indicating the cross-validation fold in which each observation should be included. Values in this vector should range from 1 to |
stepsize_init |
Initial value of the stepsize of the optimization algorithm. Default is 1. |
stepsize_shrink |
Factor in |
tol |
Convergence criterion. Algorithm stops when the |
maxit |
Maximum number of iterations allowed. |
verbose |
Logical, whether progress is printed. |
Details
For each lambda, 10 folds cross-validation (by default) is performed. The cv error is defined as follows. Suppose we perform K
-fold cross-validation, denote \hat{\beta}^{-k}
by the estimate obtained from the rest of K-1
folds (training set). The error of the k
-th fold (test set) is defined as 2(P-Q)
divided by R
, where P
is the log partial likelihood evaluated at \hat{\beta}^{-k}
using the entire dataset, Q is the log partial likelihood evaluated at \hat{\beta}^{-k}
using the training set, and R is the number of events in the test set. We do not use the negative log partial likelihood evaluated at \hat{\beta}^{-k}
using the test set because the former definition can efficiently use the risk set, and thus it is more stable when the number of events in each test set is small (think of leave-one-out). The cv error is used in parameter tuning. To account for balance in outcomes among the randomly formed test set, we divide the deviance 2(P-Q)
by R.
To get the estimated coefficients that has the minimum cv error, use sox_cv()$Estimates[, sox_cv$index["min",]]
. To apply the 1-se rule, use sox_cv()$Estimates[, sox_cv$index["1se",]]
.
Value
A list.
lambdas |
A vector of lambda used for each cross-validation. |
cvm |
The cv error averaged across all folds for each lambda. |
cvsd |
The standard error of the cv error for each lambda. |
cvup |
The cv error plus its standard error for each lambda. |
cvlo |
The cv error minus its standard error for each lambda. |
nzero |
The number of non-zero coefficients at each lambda. |
sox.fit |
A fitted model for the full data at all lambdas of class " |
lambda.min |
The lambda such that the |
lambda.1se |
The maximum of lambda such that the |
foldid |
The fold assignments used. |
index |
A one column matrix with the indices of |
iterations |
A vector of number of iterations it takes to converge at each |
See Also
Examples
x <- as.matrix(sim[, c("A1","A2","C1","C2","B","A1B","A2B","C1B","C2B")])
lam.seq <- exp(seq(log(1e0), log(1e-3), length.out = 20))
# Variables:
## 1: A1
## 2: A2
## 3: C1
## 4: C2
## 5: B
## 6: A1B
## 7: A2B
## 8: C1B
## 9: C2B
# Overlapping groups:
## g1: A1, A2, A1B, A2B
## g2: B, A1B, A2B, C1B, C2B
## g3: A1B, A2B
## g4: C1, C2, C1B, C2B
## g5: C1B, C2B
overlapping.groups <- list(c(1, 2, 6, 7),
c(5, 6, 7, 8, 9),
c(6, 7),
c(3, 4, 8, 9),
c(8, 9))
pars.overlapping <- overlap_structure(overlapping.groups)
cv.overlapping <- sox_cv(
x = x,
ID = sim$Id,
time = sim$Start,
time2 = sim$Stop,
event = sim$Event,
penalty = "overlapping",
lambda = lam.seq,
group = pars.overlapping$groups,
group_variable = pars.overlapping$groups_var,
penalty_weights = pars.overlapping$group_weights,
nfolds = 5,
tol = 1e-4,
maxit = 1e3,
verbose = FALSE
)
str(cv.overlapping)
# Nested groups (misspecified, for the demonstration of the software only.)
## g1: A1, A2, C1, C2, B, A1B, A2B, C1B, C2B
## g2: A1B, A2B, A1B, A2B
## g3: C1, C2, C1B, C2B
## g4: 1
## g5: 2
## ...
## G12: 9
nested.groups <- list(1:9,
c(1, 2, 6, 7),
c(3, 4, 8, 9),
1, 2, 3, 4, 5, 6, 7, 8, 9)
pars.nested <- nested_structure(nested.groups)
cv.nested <- sox_cv(
x = x,
ID = sim$Id,
time = sim$Start,
time2 = sim$Stop,
event = sim$Event,
penalty = "nested",
lambda = lam.seq,
group = pars.nested$groups,
own_variable = pars.nested$own_variables,
no_own_variable = pars.nested$N_own_variables,
penalty_weights = pars.nested$group_weights,
nfolds = 5,
tol = 1e-4,
maxit = 1e3,
verbose = FALSE
)
str(cv.nested)