bayesmixsurv.crossval {BayesMixSurv}R Documentation

Convenience functions for cross-validation-based selection of shrinkage parameter in the bayesmixsurv model.

Description

bayesmixsurv.crossval calculates cross-validation-based, out-of-sample log-likelihood of a bsgw model for a data set, given the supplied folds. bayesmixsurv.crossval.wrapper applies bayesmixsurv.crossval to a set of combinations of shrinkage parameters (lambda1,lambda2) and produces the resulting vector of log-likelihood values as well as the specific combination of shrinkage parameters associated with the maximum log-likelihood. bayesmixsurv.generate.folds generates random partitions, while bayesmixsurv.generate.folds.eventbalanced generates random partitions with events evenly distributed across partitions. The latter feature is useful for cross-valiation of small data sets with low event rates, since it prevents over-accumulation of events in one or two partitions, and lack of events altogether in other partitions.

Usage

bayesmixsurv.generate.folds(ntot, nfold=5)
bayesmixsurv.generate.folds.eventbalanced(formula, data, nfold=5)
bayesmixsurv.crossval(data, folds, all=FALSE, print.level=1
  , control=bayesmixsurv.control(), ...)
bayesmixsurv.crossval.wrapper(data, folds, all=FALSE, print.level=1
  , control=bayesmixsurv.control(), lambda.min=0.01, lambda.max=100, nlambda=10
  , lambda1.vec=exp(seq(from=log(lambda.min), to=log(lambda.max), length.out = nlambda))
  , lambda2.vec=NULL
  , lambda12=if (is.null(lambda2.vec)) cbind(lambda1=lambda1.vec, lambda2=lambda1.vec)
    else as.matrix(expand.grid(lambda1=lambda1.vec, lambda2=lambda2.vec)), plot=TRUE, ...)

Arguments

ntot

Number of observations to create partitions for. It must typically be set to nrow(data).

nfold

Number of folds or partitions to generate.

formula

Formula specifying the covariates to be used in component 1, and the time/status response variable in the survival model.

data

Data frame containing the covariates and response, used in training and prediction.

folds

An integer vector of length nrow(data), defining fold/partition membership of each observation. For example, in 5-fold cross-validation for a data set of 200 observations, folds must be a 200-long vector with elements from the set {1,2,3,4,5}. Convenience functions bayesmixsurv.generate.folds and bayesmixsurv.generate.folds.eventbalanced can be used to generate the folds vector for a given survival data frame.

all

If TRUE, estimation objects from each cross-validation task is collected and returned for diagnostics purposes.

print.level

Verbosity of progress report.

control

List of control parameters, usually the output of bayesmixsurv.control.

lambda.min

Minimum value used to generate lambda.vec.

lambda.max

Maximum value used to generate lambda.vec.

nlambda

Length of lambda.vec vector.

lambda1.vec

Vector of shrinkage parameters to be tested for component-1 coefficients.

lambda2.vec

Vector of shrinkage parameters to be tested for component-2 coefficients.

lambda12

A data frame that enumerates all combinations of lambda1 and lambda2 to be tested. By default, it is constructed from forming all permutations of lambda1.vec and lambda2.vec. If lambda2.vec=NULL, it will only try equal values of the two parameters in each combination.

plot

If TRUE, and if the lambda1 and lambda2 entries in lambda12 are identical, a plot of loglike as a function of either vector is produced.

...

Further arguments passed to bayesmixsurv.

Value

Functions bayesmixsurv.generate.folds and bayesmixsurv.generate.folds.eventbalanced produce integer vectors of length ntot or nrow(data) respectively. The output of these functions can be directly passed to bayesmixsurv.crossval or bayesmixsurv.crossval.wrapper. Function bayesmixsurv.crossval returns the log-likelihood of data under the assumed bsgw model, calculated using a cross-validation scheme with the supplied fold parameter. If all=TRUE, the estimation objects for each of the nfold estimation jobs will be returned as the "estobjs" attribute of the returned value. Function bayesmixsurv.crossval.wrapper returns a list with elements lambda1 and lambda2, the optimal shrinkage parameters for components 1 and 2, respectively. Additionally, the following attributes are attached:

loglike.vec

Vector of log-likelihood values, one for each tested combination of lambda1 and lambda2.

loglike.opt

The maximum log-likelihood value from the loglike.vec.

lambda12

Data frame with columns lambda1 and lambda2. Each row of this data frame contains one combination of shrinkage parameters that are tested in the wrapper function.

estobjs

If all=TRUE, a list of length nrow(lambda12) is returned, with each element being itself a list of nfold estimation objects associated with each call to the bayesmixsurv function. This object can be examined by the user for diagnostic purposes, e.g. by applying plot against each object.

Author(s)

Alireza S. Mahani, Mansour T.A. Sharabiani

Examples

# NOTE: to ensure convergence, typically more than 30 samples are needed
folds <- bayesmixsurv.generate.folds.eventbalanced(Surv(futime, fustat) ~ 1, ovarian, 5)
cv <- bayesmixsurv.crossval(ovarian, folds, formula1=Surv(futime, fustat) ~ ecog.ps + rx
  , control=bayesmixsurv.control(iter=30, nskip=10), print.level = 3)
cv2 <- bayesmixsurv.crossval.wrapper(ovarian, folds, formula1=Surv(futime, fustat) ~ ecog.ps + rx
  , control=bayesmixsurv.control(iter=30, nskip=10)
  , lambda1.vec=exp(seq(from=log(0.1), to=log(1), length.out = 3)))

[Package BayesMixSurv version 0.9.1 Index]