alacoxIC {ALassoSurvIC} | R Documentation |
Performing variable selection with an adaptive lasso penalty for interval censored and possibly left truncated data
Description
The alacoxIC
function performs variable selection with an adaptive lasso penalty for interval censored and possibly left truncated data. It performs penalized nonparametric maximum likelihood estimation through a penalized EM algorithm by following Li et al. (2019). The function searches the optimal thresholding parameter automatically, based on BIC. The variable selection approach, implemented by the alacoxIC
function, is proven to enjoy the desirable oracle property introduced by Fan & Li (2001). The full details are available in Li et al. (2019).
Usage
## Default S3 method:
alacoxIC(lowerIC, upperIC, X, trunc, theta,
normalize.X = TRUE, cl = NULL, max.theta = 1000, tol = 0.001,
niter = 1e+05, string.cen = Inf, string.missing = NA, ...)
Arguments
... |
for S4 method only. |
lowerIC |
A numeric vector for the lower limit of the censoring interval. |
upperIC |
A numeric vector for the upper limit of the censoring interval. |
X |
A numeric matrix for the covariates that will be used for variable selection. |
trunc |
A numeric vector for left truncated times. If supplied, the function performs the variable selection for interval censored and left truncated data. If |
theta |
A numeric value for the thresholding parameter. If |
normalize.X |
A logical value: if |
cl |
A cluster object created by |
max.theta |
A numeric value for the maximum value that a thresholding parameter can take when searching the optimal one. The algorithm will look up an optimal tunning parameter below |
tol |
A numeric value for the absolute iteration convergence tolerance. |
niter |
A numeric value for the maximum number of iterations. |
string.cen |
A string indicating right censoring for |
string.missing |
A string indicating missing value. Default is |
Details
The grid search algorithm is used to find the optimal thresholding parameter using a grid search algorithm, based on BIC. Specifically, the alacoxIC
function first searches the smallest integer thresholding parameter which all coefficient estimates are zero beween 1
and max.theta
and then creates one hundred grid points by following the rule of Simon et al. (2011, Section 2.3). The one minimizing BIC among the one hundred candidates is chosen as the optimal thresholding parameter in the adaptive lasso estimation.
The cluster object, created by makeCluster
in the parallel
package, can be supplied with the cl
argument to reduce computation time via parallel computing. The parallel computing will be used when searching the optimal thresholding parameter and calculating the hessian matrix of the log profile likelihood. How to use the parallel computing is illustrated in one of the examples given below.
Use the baseline
function and the plot
function to extract and plot the estimate of the baseline cumulative hazard function, respectively, from the object returned by the alacoxIC
. The plot
function also provides the plot of the estimated baseline survival function. See the usages in the examples given below.
References
Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association, 96(456), 1348-1360
Simon, N., Friedman, J., Hastie, T., & Tibshirani, R. (2011). Regularization paths for Cox’s proportional hazards model via coordinate descent. Journal of statistical software, 39(5), 1.
Li, C., Pak, D., & Todem, D. (2019). Adaptive lasso for the Cox regression with interval censored and possibly left truncated data. Statistical methods in medical research. doi: 10.1177/0962280219856238
See Also
unpencoxIC
Examples
library(ALassoSurvIC)
### Variable selection for interval censored data
data(ex_IC) # the 'ex_IC' data having 100 subjects and 6 covariates
lowerIC <- ex_IC$lowerIC
upperIC <- ex_IC$upperIC
X <- ex_IC[, -c(1:2)]
## Performing the variable selection algorithm using a single core
system.time(result <- alacoxIC(lowerIC, upperIC, X))
## Use parallel computing to reduce the computation time
library(parallel)
cl <- makeCluster(2L) # making the cluster object 'cl' with two CPU cores
system.time(result <- alacoxIC(lowerIC, upperIC, X, cl = cl))
result # main result
baseline(result) # obtaining the baseline cumulative hazard estimate
plot(result) # plotting the baseline estimated cumulative hazard function by default
plot(result, what = "survival") # plotting the estimated baseline survival function
on.exit()
### Variable selection for interval censored and left truncated data
## Try following codes with the 'ex_ICLT' data example
data(ex_ICLT) # the 'ex_ICLT' data having 100 subjects and 6 covariates
lowerIC <- ex_ICLT$lowerIC
upperIC <- ex_ICLT$upperIC
trunc <- ex_ICLT$trunc
X <- ex_ICLT[, -c(1:3)]
result2 <- alacoxIC(lowerIC, upperIC, X, trunc)
result2
baseline(result2)
plot(result2)
plot(result2, what = "survival")