penAFT.cv {penAFT} | R Documentation |
Cross-validation function for fitting a regularized semiparametric accelerated failure time model
Description
A function to perform cross-validation and compute the solution path for the regularized semiparametric accelerated failure time model estimator.
Usage
penAFT.cv(X, logY, delta, nlambda = 50,
lambda.ratio.min = 0.1, lambda = NULL,
penalty = NULL, alpha = 1,weight.set = NULL,
groups = NULL, tol.abs = 1e-8, tol.rel = 2.5e-4,
standardize = TRUE, nfolds = 5, cv.index = NULL,
admm.max.iter = 1e4,quiet = TRUE)
Arguments
X |
An |
logY |
An |
delta |
An |
nlambda |
The number of candidate tuning parameters to consider. |
lambda.ratio.min |
The ratio of maximum to minimum candidate tuning parameter value. As a default, we suggest 0.1, but standard model selection procedures should be applied to select |
lambda |
An optional (not recommended) prespecified vector of candidate tuning parameters. Should be in descending order. |
penalty |
Either "EN" or "SG" for elastic net or sparse group lasso penalties. |
alpha |
The tuning parameter |
weight.set |
A list of weights. For both penalties, |
groups |
When using penalty "SG", a |
tol.abs |
Absolute convergence tolerance. |
tol.rel |
Relative convergence tolerance. |
standardize |
Should predictors be standardized (i.e., scaled to have unit variance) for model fitting? |
nfolds |
The number of folds to be used for cross-validation. Default is five. Ten is recommended when sample size is especially small. |
cv.index |
A list of length |
admm.max.iter |
Maximum number of ADMM iterations. |
quiet |
|
Details
Given (\log y_1 , x_1, \delta_1),\dots,(\log y_n , x_n, \delta_n)
where for subject i
(i = 1, \dots, n
), y_i
is the minimum of the survival time and censoring time, x_i
is a p
-dimensional predictor, and \delta_i
is the indicator of censoring, penAFT.cv
performs nfolds
cross-validation for selecting the tuning parameter to be used in the argument minimizing
\frac{1}{n^2}\sum_{i=1}^n \sum_{j=1}^n \delta_i \{ \log y_i - \log y_j - (x_i - x_j)'\beta \}^{-} + \lambda g(\beta)
where \{a \}^{-} := \max(-a, 0)
, \lambda > 0
, and g
is either the weighted elastic net penalty (penalty = "EN"
) or weighted sparse group lasso penalty (penalty = "SG"
).
The weighted elastic net penalty is defined as
\alpha \| w \circ \beta\|_1 + \frac{(1-\alpha)}{2}\|\beta\|_2^2
where w
is a set of non-negative weights (which can be specified in the weight.set
argument). The weighted sparse group-lasso penalty we consider is
\alpha \| w \circ \beta\|_1 + (1-\alpha)\sum_{l=1}^G v_l\|\beta_{\mathcal{G}_l}\|_2
where again, w
is a set of non-negative weights and v_l
are weights applied to each of the G
groups.
Next, we define the cross-validation errors.
Let \mathcal{V}_1, \dots, \mathcal{V}_K
be a random nfolds
= K
element partition of [n]
(the subjects) with the cardinality of each \mathcal{V}_k
(the "kth fold"") approximately equal for k = 1, \dots, K
.
Let {\hat{\beta}}_{\lambda(-\mathcal{V}_k)}
be the solution with tuning parameter \lambda
using only data indexed by [n] \setminus \{\mathcal{V}_k\}
(i.e., outside the kth fold). Then, definining e_i(\beta) := \log y_i - \beta'x_i
for i= 1, \dots, n
, we call
\sum_{k=1}^K \left[\frac{1}{|\mathcal{V}_k|^2} \sum_{i \in \mathcal{V}_k} \sum_{j \in \mathcal{V}_k} \delta_i \{e_i({\hat{\beta}}_{\lambda(-\mathcal{V}_k)}) - e_{j}({\hat{\beta}}_{\lambda(-\mathcal{V}_k)})\}^{-}\right],
the cross-validated Gehan loss at \lambda
in the k
th fold, and refer to the sum over all nfolds
= K
folds as the cross-validated Gehan loss.
Similarly, letting
letting
\tilde{e}_i({\hat{\beta}}_\lambda) = \sum_{k = 1}^K (\log y_i - x_i'{\hat{\beta}}_{\lambda(-\mathcal{V}_k)}) \mathbf{1}(i \in \mathcal{V}_k)
for each i \in [n]
,
we call
\left[\sum_{i = 1}^n \sum_{j = 1}^n \delta_i \{\tilde{e}_i({\hat{\beta}}_\lambda) - \tilde{e}_j({\hat{\beta}}_\lambda)\}^{-}\right]
the cross-validated linear predictor score at \lambda
.
Value
full.fit |
A model fit with the same output as a model fit using |
cv.err.linPred |
A |
cv.err.obj |
A |
cv.index |
A list of length |
Examples
# --------------------------------------
# Generate data
# --------------------------------------
set.seed(1)
genData <- genSurvData(n = 50, p = 50, s = 10, mag = 2, cens.quant = 0.6)
X <- genData$X
logY <- genData$logY
delta <- genData$status
p <- dim(X)[2]
# -----------------------------------------------
# Fit elastic net penalized estimator
# -----------------------------------------------
fit.en <- penAFT.cv(X = X, logY = logY, delta = delta,
nlambda = 10, lambda.ratio.min = 0.1,
penalty = "EN", nfolds = 5,
alpha = 1)
# ---- coefficients at tuning parameter minimizing cross-valdiation error
coef.en <- penAFT.coef(fit.en)
# ---- predict at 8th tuning parameter from full fit
Xnew <- matrix(rnorm(10*p), nrow=10)
predict.en <- penAFT.predict(fit.en, Xnew = Xnew, lambda = fit.en$full.fit$lambda[8])
# -----------------------------------------------
# Fit sparse group penalized estimator
# -----------------------------------------------
groups <- rep(1:5, each = 10)
fit.sg <- penAFT.cv(X = X, logY = logY, delta = delta,
nlambda = 50, lambda.ratio.min = 0.01,
penalty = "SG", groups = groups, nfolds = 5,
alpha = 0.5)
# -----------------------------------------------
# Pass fold indices
# -----------------------------------------------
groups <- rep(1:5, each = 10)
cv.index <- list()
for(k in 1:5){
cv.index[[k]] <- which(rep(1:5, length=50) == k)
}
fit.sg.cvIndex <- penAFT.cv(X = X, logY = logY, delta = delta,
nlambda = 50, lambda.ratio.min = 0.01,
penalty = "SG", groups = groups,
cv.index = cv.index,
alpha = 0.5)
# --- compare cv indices
## Not run: fit.sg.cvIndex$cv.index == cv.index