multipredict {landmulti}R Documentation

Landmark prediction with multiple short-term events

Description

Landmark prediction with multiple short-term events

Usage

multipredict(
  data,
  formula,
  t0,
  L,
  SE = FALSE,
  SE.gs = FALSE,
  s1_beta1 = NULL,
  s2_beta2 = NULL,
  s1s2_beta3 = NULL,
  grid1 = seq(0.01, 5, length.out = 20),
  grid2 = seq(0.01, 5, length.out = 20),
  grid3 = list(seq(0.01, 5, length.out = 20), seq(0.01, 5, length.out = 20)),
  folds.grid = 8,
  reps.grid = 3,
  c01 = 0.1,
  c02 = 0.1,
  c03 = 0.05,
  B = 500,
  gs.method = "loop",
  gs.cl = NULL,
  gs.seed = NULL
)

Arguments

data

Input dataset

formula

a formula object, with a Surv() object, such as Surv(time, event), on the left of a ~ operator, and the terms on the right. On the right-hand-side, the time to the occurrence of short-term event 1 and short-term event 2 should be called by statement s1() and s2(), respectively. The details of model specification are given under ‘Details’

t0

Landmark time

L

Length of time into the future (starting from the landmark time) for which we want to make a risk prediction. This is called the 'prediction horizon' in the dynamic prediction literature

SE

Logical. 'True' if user wants to estimate SE for the coefficient using the perturbation-resampling method

SE.gs

Logical. 'True' if user wants to conduct grid search for the bandwidth in each perturbation. It is expected to give more accurate results but will consume longer time. 'False' if user wants to use the same bandwidth found in the point estimation for all perturbations

s1_beta1

A scalar or a vector. Time to the occurrence of short-term event 1 for the estimation of the regression coefficient beta1 in group 2. If a Null is given, then the coefficients for group 2 will NOT be estimated

s2_beta2

A scalar or a vector. Time to the occurrence of short-term event 2 for the estimation of the regression coefficient beta2 in group 3. If a Null is given, then the coefficients for group 3 will NOT be estimated

s1s2_beta3

A matrix or a dataframe with two columns. The first column should be s1 and the second should be s2. Time to the occurrence of short-term event 1 & 2 for the estimation of the regression coefficient beta3 in group 4. If a Null is given, then the coefficients for group 4 will NOT be estimated.

grid1

A prespecified grid for bandwidth search for group2

grid2

A prespecified grid for bandwidth search for group3

grid3

A list with prespecified grids for bandwidth search for group4

folds.grid

The number of folds in cross-validation

reps.grid

The number of repetitions of cross-validation

c01

A constant to shrink the bandwidth for group2

c02

A constant to shrink the bandwidth for group3

c03

A constant to shrink the bandwidth for group4

B

Number of perturbations for estimating SE

gs.method

Method used by gridsearch. Default is 'loop'. Use 'snow' will implement parallel computing and will speed up the calculation

gs.cl

Default is Null. Number of clusters used in parallel computing in gridsearch. Specify when gs.method = 'snow'

gs.seed

An integer to set the seed for parallel computing to ensure reproducible outcome, or 'NULL' if not to set reproducible outcome

Details

The multipredict function fits time-fixed model and univariate/bivariate varying-coefficient models using the data from subgroups formed based on the information on the short-term outcomes (such as HF hospitalization and CHD hospitalization) before landmark time t0, among those who haven't experienced the long-term outcome (such as death) at t0. In this way the short-term outcome information are incorporated into the prediction of long-term survival outcomes, and the risk prediction can vary based on the event times of the short-term outcomes.

The +s1() statement specified the column that determines the occurrence time of the first short-term outcome. The +s2() statement specified the column that determines the occurrence time of the second short-term outcome.

User may set the statement gs.method = 'True'.

By default the regression coefficients for group 1 is calculated in each run of this function.

Currently, parameter estimates from parallel computing are slightly different in each run because of the different (uncontrolled) random numbers used in the estimation. This will be solved in the near future.

Value

returns estimated coefficients for each short-term outcome and the long-term outcome:

coefficients

A named vector of the estimated regression coefficients

SE

The standard error of coefficients estimated by perturbation resampling

Author(s)

Wen Li, Qian Wang

References

Li, Wen. (2023), "Landmarking Using A Flexible Varying Coefficient Model to Improve Prediction Accuracy of Long-term Survival Following Multiple Short-term Events An Application to the Atherosclerosis Risk in Communities (ARIC) Study", Statistics in Medicine 90(7) 1-29. doi:10.18637/jss.v090.i07

Parast, Layla, Su-Chun Cheng, and Tianxi Cai. (2012), "Landmark Prediction of Long Term Survival Incorporating Short Term Event Time Information", J Am Stat Assoc 107(500) 1492-1501. doi: 10.1080/01621459.2012.721281

"Incorporating short-term outcome information to predict long-term survival with discrete markers". Biometrical Journal 53.2 (2011): 294-307. doi: 10.1080/01621459.2012.721281

Examples

library(survival)
library(emdbook)
library(NMOF)
library(landpred)
library(snow)
set.seed(1234)
res <- multipredict(data = simulation, formula = Surv(time, outcome) ~ age + s1(st1) + s2(st2),
                t0 = 5, L = 20, SE = FALSE,
                gs.method = "loop", gs.cl = 2, SE.gs = FALSE, B = 200, gs.seed = 100,
                s1_beta1 = 1.5, grid1 = seq(0.01, 5, length.out=20),
                s2_beta2 = 1.5, grid2 = seq(0.01, 5, length.out=20),
                s1s2_beta3 = NULL, grid3=list(seq(0.01, 5, length.out=20),
                                                seq(0.01, 5, length.out=20)))
print(res)



[Package landmulti version 0.5.0 Index]