stackG {survML} | R Documentation |
Estimate a conditional survival function using global survival stacking
Description
Estimate a conditional survival function using global survival stacking
Usage
stackG(
time,
event = rep(1, length(time)),
entry = NULL,
X,
newX = NULL,
newtimes = NULL,
direction = "prospective",
time_grid_fit = NULL,
bin_size = NULL,
time_basis,
time_grid_approx = sort(unique(time)),
surv_form = "PI",
learner = "SuperLearner",
SL_control = list(SL.library = c("SL.mean"), V = 10, method = "method.NNLS", stratifyCV
= FALSE),
tau = NULL
)
Arguments
time |
|
event |
|
entry |
Study entry variable, if applicable. Defaults to |
X |
|
newX |
|
newtimes |
|
direction |
Whether the data come from a prospective or retrospective study.
This determines whether the data are treated as subject to left truncation and
right censoring ( |
time_grid_fit |
Named list of numeric vectors of times of times on which to discretize
for estimation of cumulative probability functions. This is an alternative to
|
bin_size |
Size of time bin on which to discretize for estimation
of cumulative probability functions. Can be a number between 0 and 1,
indicating the size of quantile grid (e.g. |
time_basis |
How to treat time for training the binary
classifier. Options are |
time_grid_approx |
Numeric vector of times at which to
approximate product integral or cumulative hazard interval.
Defaults to |
surv_form |
Mapping from hazard estimate to survival estimate.
Can be either |
learner |
Which binary regression algorithm to use. Currently, only
|
SL_control |
Named list of parameters controlling the Super Learner fitting
process. These parameters are passed directly to the |
tau |
The maximum time of interest in a study, used for
retrospective conditional survival estimation. Rather than dealing
with right truncation separately than left truncation, it is simpler to
estimate the survival function of |
Value
A named list of class stackG
, with the following components:
S_T_preds |
An |
S_C_preds |
An |
time_grid_approx |
The approximation grid for the product integral or cumulative hazard integral, (user-specified). |
direction |
Whether the data come from a prospective or retrospective study (user-specified). |
tau |
The maximum time of interest in a study, used for retrospective conditional survival estimation (user-specified). |
surv_form |
Exponential or product-integral form (user-specified). |
time_basis |
Whether time is included in the regression as |
SL_control |
Named list of parameters controlling the Super Learner fitting process (user-specified). |
fits |
A named list of fitted regression objects corresponding to the constituent regressions needed for
global survival stacking. Includes |
References
Wolock C.J., Gilbert P.B., Simon N., and Carone, M. (2022). "A framework for leveraging machine learning tools to estimate personalized survival curves."
Examples
# This is a small simulation example
set.seed(123)
n <- 500
X <- data.frame(X1 = rnorm(n), X2 = rbinom(n, size = 1, prob = 0.5))
S0 <- function(t, x){
pexp(t, rate = exp(-2 + x[,1] - x[,2] + .5 * x[,1] * x[,2]), lower.tail = FALSE)
}
T <- rexp(n, rate = exp(-2 + X[,1] - X[,2] + .5 * X[,1] * X[,2]))
G0 <- function(t, x) {
as.numeric(t < 15) *.9*pexp(t,
rate = exp(-2 -.5*x[,1]-.25*x[,2]+.5*x[,1]*x[,2]),
lower.tail=FALSE)
}
C <- rexp(n, exp(-2 -.5 * X[,1] - .25 * X[,2] + .5 * X[,1] * X[,2]))
C[C > 15] <- 15
entry <- runif(n, 0, 15)
time <- pmin(T, C)
event <- as.numeric(T <= C)
sampled <- which(time >= entry)
X <- X[sampled,]
time <- time[sampled]
event <- event[sampled]
entry <- entry[sampled]
# Note that this a very small Super Learner library, for computational purposes.
SL.library <- c("SL.mean", "SL.glm")
fit <- stackG(time = time,
event = event,
entry = entry,
X = X,
newX = X,
newtimes = seq(0, 15, .1),
direction = "prospective",
bin_size = 0.1,
time_basis = "continuous",
time_grid_approx = sort(unique(time)),
surv_form = "exp",
learner = "SuperLearner",
SL_control = list(SL.library = SL.library,
V = 5))
plot(fit$S_T_preds[1,], S0(t = seq(0, 15, .1), X[1,]))
abline(0,1,col='red')