tfCox_choose_lambda {tfCox}R Documentation

Choose the tuning parameter lambda using training and testing dataset

Description

Fit additive trend filtering Cox model where each component function is estimated to be piecewise constant or polynomial. Tuning parameter is selected via training and testing dataset described in Wu and Witten (2019). Training data is used to build the model, and testing data is used for selecting tuning parameter based on log likelihood. It is a convenience function to replicate the simulation results in Wu and Witten (2019).

Usage

tfCox_choose_lambda(dat, test_dat, ord = 0, alpha = 1, discrete = NULL, 
lam_seq = NULL, nlambda = 30, c = NULL, tol = 1e-06, niter=1000, 
stepSize=25, backtracking=0)

Arguments

dat

A list that contains time, status and X. time is failure or censoring time, status is censoring indicator, and X is n x p matrix and may have p > n. This is the training data that will be used for estimation for a given tuning parameter lambda.

test_dat

Same list frame as before. This is the testing data that will be used for selecting tuning parameter based on the log likelihood fit.

ord

The polynomial order of the trend filtering fit; a non-negative interger (ord>= 3 is not recommended). For instance, ord=0 will produce piewise constant fit, ord=1 will produce piewise linear fit, and ord=2 will produce piewise quadratic fit.

alpha

The trade-off between trend filtering penalty and group lasso penalty. It must be in [0,1]. alpha=1 corresponds to the case with only trend filtering penalty to produce piecewise polynomial, and alpha=0 corresponds to the case with only group lasso penalty to produce sparsity of the functions. alpha between 0 and 1 is the tradeoff between the strength of these two penalties. For p < n, we suggest using 1.

discrete

A vector of covariate/feature indice that are discrete. Discrete covariates are not penalized in the model. Default NULL means that none of the covariates are discrete thus all covariates will be penalized in the model.

lam_seq

The sequence of positive lambda values to consider. The default is NULL, which calculates lambda.seq using lambda.min.ratio and n.lambda. If lambda.seq is provided, it will override the default. lambda.seq should be a decreasing positive sequence of values since cv_tfCox replies on warm starts to speed up the computation.

nlambda

The number of lambda values to consider. Default is 30.

c

Smallest value for lam_seq, as a fraction of the maximum lambda value, which is the smallest value such that the penalty term is zero. The default is NULL.

tol

Convergence criterion for estimates.

niter

Maximum number of iterations.

stepSize

Iniitial step size. Default is 25.

backtracking

Whether backtracking should be used 1 (TRUE) or 0 (FALSE). Default is 0 (FALSE).

Value

lam_seq

Lambda sequence considered.

loss

Loss based on the testing data with the same length as lambda.seq

knots

Number of knots from the training data with the same length as lambda.seq

paramfit

Mean square error between the estimated and true theta for the testing data.

best_lambda

The lambda that achieves the minimum loss for testing data.

Author(s)

Jiacheng Wu

References

Jiacheng Wu & Daniela Witten (2019) Flexible and Interpretable Models for Survival Data, Journal of Computational and Graphical Statistics, DOI: 10.1080/10618600.2019.1592758

See Also

predict_best_lambda, negloglik

Examples

#generate training and testing data
dat = sim_dat(n=100, zerof=0, scenario=1)
test_dat = sim_dat(n=100, zerof=0, scenario=1)

#choose the optimal tuning parameter
cv = tfCox_choose_lambda(dat, test_dat, ord=0, alpha=1)
plot(cv$lam_seq, cv$loss)

#optimal tuning parameter
cv$best_lambda

#predict the coefficients of testing covariates from the optimal tuning parameter
#from tfCox_choose_lambda object. 
theta_hat = predict_best_lambda(cv, test_dat$X)

#calculate the loss in the testing data based on the estimated coefficients theta
negloglik(test_dat, theta_hat)

[Package tfCox version 0.1.0 Index]