R: Landmark prediction with multiple short-term events

multipredict {landmulti}

R Documentation

Landmark prediction with multiple short-term events

Description

Landmark prediction with multiple short-term events

Usage

multipredict(
  data,
  formula,
  t0,
  L,
  SE = FALSE,
  SE.gs = FALSE,
  s1_beta1 = NULL,
  s2_beta2 = NULL,
  s1s2_beta3 = NULL,
  grid1 = seq(0.01, 5, length.out = 20),
  grid2 = seq(0.01, 5, length.out = 20),
  grid3 = list(seq(0.01, 5, length.out = 20), seq(0.01, 5, length.out = 20)),
  folds.grid = 8,
  reps.grid = 3,
  c01 = 0.1,
  c02 = 0.1,
  c03 = 0.05,
  B = 500,
  gs.method = "loop",
  gs.cl = NULL,
  gs.seed = NULL
)

Arguments

`data`	Input dataset
`formula`	a `formula` object, with a `Surv()` object, such as `Surv(time, event)`, on the left of a `~` operator, and the terms on the right. On the right-hand-side, the time to the occurrence of short-term event 1 and short-term event 2 should be called by statement `s1()` and `s2()`, respectively. The details of model specification are given under ‘Details’
`t0`	Landmark time
`L`	Length of time into the future (starting from the landmark time) for which we want to make a risk prediction. This is called the 'prediction horizon' in the dynamic prediction literature
`SE`	Logical. 'True' if user wants to estimate SE for the coefficient using the perturbation-resampling method
`SE.gs`	Logical. 'True' if user wants to conduct grid search for the bandwidth in each perturbation. It is expected to give more accurate results but will consume longer time. 'False' if user wants to use the same bandwidth found in the point estimation for all perturbations
`s1_beta1`	A scalar or a vector. Time to the occurrence of short-term event 1 for the estimation of the regression coefficient beta1 in group 2. If a `Null` is given, then the coefficients for group 2 will NOT be estimated
`s2_beta2`	A scalar or a vector. Time to the occurrence of short-term event 2 for the estimation of the regression coefficient beta2 in group 3. If a `Null` is given, then the coefficients for group 3 will NOT be estimated
`s1s2_beta3`	A matrix or a dataframe with two columns. The first column should be s1 and the second should be s2. Time to the occurrence of short-term event 1 & 2 for the estimation of the regression coefficient beta3 in group 4. If a `Null` is given, then the coefficients for group 4 will NOT be estimated.
`grid1`	A prespecified grid for bandwidth search for group2
`grid2`	A prespecified grid for bandwidth search for group3
`grid3`	A list with prespecified grids for bandwidth search for group4
`folds.grid`	The number of folds in cross-validation
`reps.grid`	The number of repetitions of cross-validation
`c01`	A constant to shrink the bandwidth for group2
`c02`	A constant to shrink the bandwidth for group3
`c03`	A constant to shrink the bandwidth for group4
`B`	Number of perturbations for estimating SE
`gs.method`	Method used by gridsearch. Default is 'loop'. Use 'snow' will implement parallel computing and will speed up the calculation
`gs.cl`	Default is `Null`. Number of clusters used in parallel computing in gridsearch. Specify when gs.method = 'snow'
`gs.seed`	An integer to set the seed for parallel computing to ensure reproducible outcome, or 'NULL' if not to set reproducible outcome

Details

The multipredict function fits time-fixed model and univariate/bivariate varying-coefficient models using the data from subgroups formed based on the information on the short-term outcomes (such as HF hospitalization and CHD hospitalization) before landmark time t0, among those who haven't experienced the long-term outcome (such as death) at t0. In this way the short-term outcome information are incorporated into the prediction of long-term survival outcomes, and the risk prediction can vary based on the event times of the short-term outcomes.

The +s1() statement specified the column that determines the occurrence time of the first short-term outcome. The +s2() statement specified the column that determines the occurrence time of the second short-term outcome.

User may set the statement gs.method = 'True'.

By default the regression coefficients for group 1 is calculated in each run of this function.

Currently, parameter estimates from parallel computing are slightly different in each run because of the different (uncontrolled) random numbers used in the estimation. This will be solved in the near future.

Value

returns estimated coefficients for each short-term outcome and the long-term outcome:

`coefficients`	A named vector of the estimated regression coefficients
`SE`	The standard error of coefficients estimated by perturbation resampling

Author(s)

Wen Li, Qian Wang

References

Li, Wen. (2023), "Landmarking Using A Flexible Varying Coefficient Model to Improve Prediction Accuracy of Long-term Survival Following Multiple Short-term Events An Application to the Atherosclerosis Risk in Communities (ARIC) Study", Statistics in Medicine 90(7) 1-29. doi:10.18637/jss.v090.i07

Parast, Layla, Su-Chun Cheng, and Tianxi Cai. (2012), "Landmark Prediction of Long Term Survival Incorporating Short Term Event Time Information", J Am Stat Assoc 107(500) 1492-1501. doi: 10.1080/01621459.2012.721281

"Incorporating short-term outcome information to predict long-term survival with discrete markers". Biometrical Journal 53.2 (2011): 294-307. doi: 10.1080/01621459.2012.721281

Examples

library(survival)
library(emdbook)
library(NMOF)
library(landpred)
library(snow)
set.seed(1234)
res <- multipredict(data = simulation, formula = Surv(time, outcome) ~ age + s1(st1) + s2(st2),
                t0 = 5, L = 20, SE = FALSE,
                gs.method = "loop", gs.cl = 2, SE.gs = FALSE, B = 200, gs.seed = 100,
                s1_beta1 = 1.5, grid1 = seq(0.01, 5, length.out=20),
                s2_beta2 = 1.5, grid2 = seq(0.01, 5, length.out=20),
                s1s2_beta3 = NULL, grid3=list(seq(0.01, 5, length.out=20),
                                                seq(0.01, 5, length.out=20)))
print(res)

[Package landmulti version 0.5.0 Index]