R: Compute the posterior mean and a credible band for the...

BayesSurv {BayesSurvival}

R Documentation

Compute the posterior mean and a credible band for the survival and cumulative hazard, and the posterior mean for the hazard.

Description

This is the main function of this package, computing relevant quantities for a Bayesian survival analysis of (possibly right-censored) time-to-event-data. Starting with a piecewise exponential prior with dependent or independent Gamma heights (details below) on the hazard function, the function computes the posterior mean for the hazard, cumulative hazard and survival function, serving as an estimator for the true functions. In addition, for the cumulative hazard and survival function, the radius for a fixed-width credible band is computed. The interpretation of this credible band as a confidence band is justified in Castillo and Van der Pas (2020).

Usage

BayesSurv(
  df,
  time = "time",
  event = "event",
  prior = c("Dependent", "Independent"),
  K = ceiling((dim(df)[1]/log(dim(df)[1]))^(1/2)),
  time.max = max(df[[time]]),
  alpha = 0.05,
  N = 1000,
  alpha.dep = 1,
  alpha0.dep = 1.5,
  beta0.dep = 1,
  alpha.indep = 1.5,
  beta.indep = 1,
  surv.factor = 10,
  surv.epsilon = 1e-10
)

Arguments

`df`	A dataframe, containing at minimum a column with follow-up times and a column with a status indicator (event observed or censored).
`time`	The name of the column in the dataframe containing the (possibly right-censored) follow-up times, that is, the minimum of the time of the event and the time of censoring. Input the name as character/string.
`event`	The name of the column in the dataframe containing the status indicator, which must be coded as: 0 = censored, 1 = event observed. Input the name as character/string.
`prior`	Select either dependent or independent Gamma heights for the piecewise exponential prior on the hazard. Dependent heights (with the Markov structure described below) is default.
`K`	The number of intervals to be used in the piecewise exponential (histogram) prior. Default is set to `K = (n / \log{n})^{1/2}`, with `n` the number of observations, as recommended by Castillo and Van der Pas (2020).
`time.max`	The maximum follow-up time to consider, corresponding to the parameter `\tau` in Castillo and Van der Pas (2020).
`alpha`	The function will compute (1-`alpha`)100% credible bands for the cumulative hazard and survival function.
`N`	The number of samples to draw from the posterior.
`alpha.dep`	For the dependent Gamma prior only. The main parameter `\alpha` for the dependent Gamma prior, as described below. It is recommended to take `alpha.dep` smaller than `alpha0.dep`.
`alpha0.dep`	For the dependent Gamma prior only. The shape parameter for the Gamma prior on the histogram height for the first interval. It is recommended to take `alpha.dep` smaller than `alpha0.dep`.
`beta0.dep`	For the dependent Gamma prior only. The rate parameter for the Gamma prior on the histogram height for the first interval.
`alpha.indep`	For the independent Gamma prior only. The shape parameter for the Gamma prior on the histogram height for each interval.
`beta.indep`	For the independent Gamma prior only. The rate parameter for the Gamma prior on the histogram height for each interval.
`surv.factor`	The survival function is computed on an equispaced grid consisting of `K x surv.factor` (the number of intervals times this factor).
`surv.epsilon`	The survival function is computed on the interval [0, `time.max` - `surv.epsilon`].

Details

There are two options for the prior: a piecewise exponential (histogram) prior with dependent Gamma heights and a piecewise exponential (histogram) prior with independent Gamma heights. Both priors are described in detail in Castillo and Van der Pas (2020). The dependent prior has a Markov structure, where the height of each interval depends on the height of the previous interval. It implements the autoregressive idea of Arjas and Gasbarra (1994). With \lambda_k the histogram height on interval k and \alpha a user-selected parameter, the structure is such that, with K the number of intervals:

E[\lambda_k | \lambda_{k-1}, \ldots, \lambda_1] = \lambda_{k-1}, k = 2, \ldots, K.

Var(\lambda_k | \lambda_{k-1}, \ldots, \lambda_1) = (\lambda_{k-1})^2/\alpha, k = 2, \ldots, K.

In the independent Gamma prior, the prior draws for the \lambda_k's are independent of each other and are taken from a Gamma distribution with user-specified shape and rate parameters.

The guideline for the number of intervals K suggested by Castillo and Van der Pas (2020) is

K = (n / \log{n})^{1/(1 + 2\gamma)},

where n is the number of observations and \gamma is related to the smoothness of the true hazard function. In the absence of information about the smoothness, a default value of \gamma = 1/2 is recommended and this is implemented as the default in this package. If this choice leads to many intervals with zero events, it is recommended to decrease the number of intervals.

The samplers used for the dependent and independent Gamma priors are described in the Supplement to Castillo and Van der Pas (2020).

Value

`haz.post.mean`	The posterior mean for the hazard, given as the value on each of the `K` intervals.
`cumhaz.post.mean`	The posterior mean for the cumulative hazard, given as the value at the end of each of the `K` intervals. The cumulative hazard can be obtained from this by starting at 0 and linearly interpolating between each of the returned values.
`cumhaz.radius`	The radius for the credible set for the cumulative hazard.
`surv.post.mean`	The posterior mean for the survival, given at each value contained in the also returned `surv.eval.grid`.
`surv.radius`	The radius for the credible set for the survival.
`surv.eval.grid`	The grid on which the posterior mean for the survival has been computed. A finer grid can be obtained by increasing `surv.factor` in the function call.
`time.max`	The maximum follow-up time considered.

References

Castillo and Van der Pas (2020). Multiscale Bayesian survival analysis. <arXiv:2005.02889>.

Arjas and Gasbarra (1994). Nonparametric Bayesian inference from right censored survival data, using the Gibbs sampler. Statistica Sinica 4(2):505-524.

Examples

#Demonstration on a simulated data set
library(simsurv)
library(ggplot2)
hazard.true <- function(t,x, betas, ...){1.2*(5*(t+0.05)^3 - 10*(t+0.05)^2 + 5*(t+0.05) ) + 0.7}
sim.df <- data.frame(id = 1:1000)
df <- simsurv(x = sim.df, maxt = 1, hazard = hazard.true)

bs <- BayesSurv(df, "eventtime", "status")
PlotBayesSurv(bs, object = "survival")
PlotBayesSurv(bs, object = "cumhaz")
PlotBayesSurv(bs, object = "hazard")

[Package BayesSurvival version 0.2.0 Index]