predict.bayesmixsurv {BayesMixSurv}R Documentation

Predict method for bayesmixsurv model fits

Description

Calculates log-likelihood and hazard/cumulative hazard/survival functions over a user-supplied vector time values, based on bayesmixsurv model object.

Usage

## S3 method for class 'bayesmixsurv'
predict(object, newdata=NULL, tvec=NULL, burnin=object$control$burnin, ...)
## S3 method for class 'predict.bayesmixsurv'
summary(object, idx=1:dim(object$smp$h)[3], burnin=object$burnin, pval=0.05
  , popmean=identical(idx,1:dim(object$smp$h)[3]), make.plot=TRUE, ...)

Arguments

object

For predict.bayesmixsurv, an object of class "bayesmixsurv", usually the result of a call to bayesmixsurv; for summary.predict.bayesmixsurv, an object of class "predict.bayesmixsurv", usually the result of a call to predict.bayesmixsurv.

newdata

An optional data frame in which to look for variables with which to predict. If omiited, the fitted values (training set) are used.

tvec

An optional vector of time values, along which time-dependent entities (hazard, cumulative hazard, survival) will be predicted. If omitted, only the time-independent entities (currently only log-likelihood) will be calculated. If a single integer is provided for tvec, it is interpreted as number of time points, equally spaced from 0 to object$tmax: tvec <- seq(from=0.0, to=object$tmax, length.out=tvec).

burnin

Number of samples to discard from the beginning of each MCMC chain before calculating median value(s) for time-independent entities.

idx

Index of observations (rows of newdata or training data) for which to generate summary statistics. Default is the entire data.

pval

Desired p-value, based on which lower/upper bounds will be calculated. Default is 0.05.

popmean

Whether population averages must be calculated or not. By default, population averages are only calculated when the entire data is included in prediction.

make.plot

Whether population mean and other plots must be created or not.

...

Further arguments to be passed to/from other methods.

Details

The time-dependent predicted objects (except loglike) are three-dimensional arrays of size (nsmp x nt x nobs), where nsmp = number of MCMC samples, nt = number of time values in tvec, and nobs = number of rows in newdata. Therefore, even for modest data sizes, these objects can occupy large chunks of memory. For example, for nsmp=1000, nt=100, nobs=1000, the three objects h, H, S have a total size of 2.2GB. Since applying quantile to these arrays is time-consuming (as needed for calculation of median and lower/upper bounds), we have left such summaries out of the scope of predict function. Users can instead apply summary to the prediction object to obtain summary statistics. During cross-validation-based selection of shrinkage parameter lambda, there is no need to supply tvec since we only need the log-likelihood value. This significantly speeds up the parameter-tuning process. The function summary.predict.bayesmixsurv allows the user to calculates summary statistics for a subset (or all of) data, if desired. This approach is in line with the overall philosophy of delaying the data summarization until necessary, to avoid unnecessary loss in accuracy due to premature blending of information contained in individual samples.

Value

The function predict.bayesmixsurv returns as object of class "predict.bayesmixsurv" with the following fields:

tvec

Actual vector of time values (if any) used for prediction.

burnin

Same as input.

median

List of median values for predicted entities. Currently, only loglike is produced. See 'Details' for explanation.

smp

List of MCMC samples for predicted entities. Elements include h1,h2,h (hazard functions for components 1,2 and their sum), H1,H2,H (cumulative hazard functions for components 1,2 and their sum), S (survival function), and loglike (model log-likelihood). All functions are evaluated over time values specified in tvec.

km.fit

Kaplan-Meyer fit of the data used for prediction (if data contains response fields).

The function summary.predict.bayesmixsurv returns a list with the following fields:

lower

A list of lower-bound values for h, H, S, hr (hazard ratio of idx[2] to idx[1] observation), and S.diff (survival probability of idx[2] minus idx[1]). The last two are only included if length(idx)==2.

median

List of median values for same entities described in lower.

upper

List of upper-bound values for same entities described in lower.

popmean

Lower-bound/median/upper-bound values for population average of survival probability.

km.fit

Kaplan-Meyer fit associated with the prediction object (if available).

Author(s)

Alireza S. Mahani, Mansour T.A. Sharabiani

Examples

est <- bayesmixsurv(Surv(futime, fustat) ~ ecog.ps + rx + age, ovarian
            , control=bayesmixsurv.control(iter=400, nskip=100))
pred <- predict(est, tvec=50)
predsumm <- summary(pred, idx=1:10)

[Package BayesMixSurv version 0.9.1 Index]