BayesSurv {BayesSurvival}R Documentation

Compute the posterior mean and a credible band for the survival and cumulative hazard, and the posterior mean for the hazard.

Description

This is the main function of this package, computing relevant quantities for a Bayesian survival analysis of (possibly right-censored) time-to-event-data. Starting with a piecewise exponential prior with dependent or independent Gamma heights (details below) on the hazard function, the function computes the posterior mean for the hazard, cumulative hazard and survival function, serving as an estimator for the true functions. In addition, for the cumulative hazard and survival function, the radius for a fixed-width credible band is computed. The interpretation of this credible band as a confidence band is justified in Castillo and Van der Pas (2020).

Usage

BayesSurv(
  df,
  time = "time",
  event = "event",
  prior = c("Dependent", "Independent"),
  K = ceiling((dim(df)[1]/log(dim(df)[1]))^(1/2)),
  time.max = max(df[[time]]),
  alpha = 0.05,
  N = 1000,
  alpha.dep = 1,
  alpha0.dep = 1.5,
  beta0.dep = 1,
  alpha.indep = 1.5,
  beta.indep = 1,
  surv.factor = 10,
  surv.epsilon = 1e-10
)

Arguments

df

A dataframe, containing at minimum a column with follow-up times and a column with a status indicator (event observed or censored).

time

The name of the column in the dataframe containing the (possibly right-censored) follow-up times, that is, the minimum of the time of the event and the time of censoring. Input the name as character/string.

event

The name of the column in the dataframe containing the status indicator, which must be coded as: 0 = censored, 1 = event observed. Input the name as character/string.

prior

Select either dependent or independent Gamma heights for the piecewise exponential prior on the hazard. Dependent heights (with the Markov structure described below) is default.

K

The number of intervals to be used in the piecewise exponential (histogram) prior. Default is set to K = (n / \log{n})^{1/2}, with n the number of observations, as recommended by Castillo and Van der Pas (2020).

time.max

The maximum follow-up time to consider, corresponding to the parameter \tau in Castillo and Van der Pas (2020).

alpha

The function will compute (1-alpha)100% credible bands for the cumulative hazard and survival function.

N

The number of samples to draw from the posterior.

alpha.dep

For the dependent Gamma prior only. The main parameter \alpha for the dependent Gamma prior, as described below. It is recommended to take alpha.dep smaller than alpha0.dep.

alpha0.dep

For the dependent Gamma prior only. The shape parameter for the Gamma prior on the histogram height for the first interval. It is recommended to take alpha.dep smaller than alpha0.dep.

beta0.dep

For the dependent Gamma prior only. The rate parameter for the Gamma prior on the histogram height for the first interval.

alpha.indep

For the independent Gamma prior only. The shape parameter for the Gamma prior on the histogram height for each interval.

beta.indep

For the independent Gamma prior only. The rate parameter for the Gamma prior on the histogram height for each interval.

surv.factor

The survival function is computed on an equispaced grid consisting of K x surv.factor (the number of intervals times this factor).

surv.epsilon

The survival function is computed on the interval [0, time.max - surv.epsilon].

Details

There are two options for the prior: a piecewise exponential (histogram) prior with dependent Gamma heights and a piecewise exponential (histogram) prior with independent Gamma heights. Both priors are described in detail in Castillo and Van der Pas (2020). The dependent prior has a Markov structure, where the height of each interval depends on the height of the previous interval. It implements the autoregressive idea of Arjas and Gasbarra (1994). With \lambda_k the histogram height on interval k and \alpha a user-selected parameter, the structure is such that, with K the number of intervals:

E[\lambda_k | \lambda_{k-1}, \ldots, \lambda_1] = \lambda_{k-1}, k = 2, \ldots, K.

Var(\lambda_k | \lambda_{k-1}, \ldots, \lambda_1) = (\lambda_{k-1})^2/\alpha, k = 2, \ldots, K.

In the independent Gamma prior, the prior draws for the \lambda_k's are independent of each other and are taken from a Gamma distribution with user-specified shape and rate parameters.

The guideline for the number of intervals K suggested by Castillo and Van der Pas (2020) is

K = (n / \log{n})^{1/(1 + 2\gamma)},

where n is the number of observations and \gamma is related to the smoothness of the true hazard function. In the absence of information about the smoothness, a default value of \gamma = 1/2 is recommended and this is implemented as the default in this package. If this choice leads to many intervals with zero events, it is recommended to decrease the number of intervals.

The samplers used for the dependent and independent Gamma priors are described in the Supplement to Castillo and Van der Pas (2020).

Value

haz.post.mean

The posterior mean for the hazard, given as the value on each of the K intervals.

cumhaz.post.mean

The posterior mean for the cumulative hazard, given as the value at the end of each of the K intervals. The cumulative hazard can be obtained from this by starting at 0 and linearly interpolating between each of the returned values.

cumhaz.radius

The radius for the credible set for the cumulative hazard.

surv.post.mean

The posterior mean for the survival, given at each value contained in the also returned surv.eval.grid.

surv.radius

The radius for the credible set for the survival.

surv.eval.grid

The grid on which the posterior mean for the survival has been computed. A finer grid can be obtained by increasing surv.factor in the function call.

time.max

The maximum follow-up time considered.

References

Castillo and Van der Pas (2020). Multiscale Bayesian survival analysis. <arXiv:2005.02889>.

Arjas and Gasbarra (1994). Nonparametric Bayesian inference from right censored survival data, using the Gibbs sampler. Statistica Sinica 4(2):505-524.

See Also

PlotBayesSurv for a function that takes the result from BayesSurv() and produces plots of the posterior mean of the hazard, the posterior mean and credible band for the cumulative hazard, and the posterior mean and credible band for the survival. To obtain direct samples from the posterior for the hazard, see SamplePosteriorDepGamma and SamplePosteriorIndepGamma.

Examples

#Demonstration on a simulated data set
library(simsurv)
library(ggplot2)
hazard.true <- function(t,x, betas, ...){1.2*(5*(t+0.05)^3 - 10*(t+0.05)^2 + 5*(t+0.05) ) + 0.7}
sim.df <- data.frame(id = 1:1000)
df <- simsurv(x = sim.df, maxt = 1, hazard = hazard.true)

bs <- BayesSurv(df, "eventtime", "status")
PlotBayesSurv(bs, object = "survival")
PlotBayesSurv(bs, object = "cumhaz")
PlotBayesSurv(bs, object = "hazard")


[Package BayesSurvival version 0.2.0 Index]