predict.bsgw {BSGW} | R Documentation |

Calculates log-likelihood and hazard/cumulative hazard/survival functions over a user-supplied vector time values, based on BSGW model object.

```
## S3 method for class 'bsgw'
predict(object, newdata=NULL, tvec=NULL, burnin=object$control$burnin, ncores=1, ...)
## S3 method for class 'predict.bsgw'
summary(object, idx=1:length(object$median$survreg.scale), burnin=object$burnin, pval=0.05
, popmean=identical(idx,1:length(object$median$survreg.scale)), make.plot=TRUE, ...)
```

`object` |
For |

`newdata` |
An optional data frame in which to look for variables with which to predict. If omiited, the fitted values (training set) are used. |

`tvec` |
An optional vector of time values, along which time-dependent entities (hazard, cumulative hazard, survival) will be predicted. If omitted, only the time-independent entities (currently only log-likelihood) will be calculated. If a single integer is provided for |

`burnin` |
Number of samples to discard from the beginning of each MCMC chain before calculating median value(s) for time-independent entities. |

`ncores` |
Number of cores to use for parallel prediction. |

`...` |
Further arguments to be passed to/from other methods. |

`idx` |
Index of observations (rows of |

`pval` |
Desired p-value, based on which lower/upper bounds will be calculated. Default is |

`popmean` |
Whether population averages must be calculated or not. By default, population averages are only calculated when the entire data is included in prediction. |

`make.plot` |
Whether population mean and other plots must be created or not. |

The time-dependent predicted objects (except `loglike`

) are three-dimensional arrays of size (`nsmp x nt x nobs`

), where `nsmp`

= number of MCMC samples, `nt`

= number of time values in `tvec`

, and `nobs`

= number of rows in `newdata`

. Therefore, even for modest data sizes, these objects can occupy large chunks of memory. For example, for `nsmp=1000, nt=100, nobs=1000`

, the three objects `h, H, S`

have a total size of 2.2GB. Since applying `quantile`

to these arrays is time-consuming (as needed for calculation of median and lower/upper bounds), we have left such summaries out of the scope of `predict`

function. Users can instead apply `summary`

to the prediction object to obtain summary statistics. During cross-validation-based selection of shrinkage parameter `lambda`

, there is no need to supply `tvec`

since we only the log-likelihood value. This significantly speeds up the parameter-tuning process. The function `summary.predict.bsgw`

allows the user to calculates summary statistics for a subset (or all of) data, if desired. This approach is in line with the overall philosophy of delaying the data summarization until necessary, to avoid unnecessary loss in accuracy due to premature blending of information contained in individual samples.

The function `predict.bsgw`

returns as object of class "predict.bsgw" with the following fields:

`tvec` |
Actual vector of time values (if any) used for prediction. |

`burnin` |
Same as input. |

`median` |
List of median values for predicted entities. Currently, only |

`smp` |
List of MCMC samples for predicted entities. Elements include |

`km.fit` |
Kaplan-Meyer fit of the data used for prediction (if data contains response fields). |

The function `summary.predict.bsgw`

returns a list with the following fields:

`lower` |
A list of lower-bound values for |

`median` |
List of median values for same entities described in |

`upper` |
List of upper-bound values for same entities described in |

`popmean` |
Lower-bound/median/upper-bound values for population average of survival probability. |

`km.fit` |
Kaplan-Meyer fit associated with the prediction object (if available). |

Alireza S. Mahani, Mansour T.A. Sharabiani

```
library("survival")
data(ovarian)
est <- bsgw(Surv(futime, fustat) ~ ecog.ps + rx, ovarian
, control=bsgw.control(iter=400, nskip=100))
pred <- predict(est, tvec=100)
predsumm <- summary(pred, idx=1:10)
```

[Package *BSGW* version 0.9.4 Index]