predBMA {BVSNLP}R Documentation

Predictive accuracy measurement using Bayesian Model Averaging

Description

This function is used for predictive accuracy measurement of the selected models using Bayesian Model Averaging methodology. The Occam's window with cut out threshold of thr is used. That means only models that have posterior probability of at least thr * posteior probability of the highest posterior probability model are considered in model averaging. For survival time response datasets, the predictive Area Under Curve (AUC) at each given time point is computed as the output. In this case, the predictive AUC is obtained using Uno's method for observations in the test set. For binary outcome data, only one AUC is reported which is from the ROC computed on the test set. The training set is used to find the selected model and relevant probabilities.

Usage

predBMA(
  bvsobj,
  X,
  resp,
  prep,
  logT,
  nlptype = "piMOM",
  train_idx,
  test_idx,
  thr = 0.05,
  times = NULL,
  family = c("logistic", "survival")
)

Arguments

bvsobj

An object that is generated by bvs function. It is the output of the Bayesian variable selection procedure.

X

The same n times p data frame initially used in bvs function. This is the full data frame containing all test and train data.

resp

For logistic regression models, this variable is the binary response vector. For the Cox proportional hazard models this is a two column matrix where the first column contains survival times and the second column is the censoring status for each observation. Note that for survival times, the time section of this variable should be in the same scale and unit (year, days, etc.) as times variable for which the AUC has to be computed.

prep

A boolean variable determining if the preprocessing step has been done on the original data frame in bvs function. This should have the same value as prep variable in the bvs function.

logT

A boolean variable determining if log transform should has been done on continuous columns of the data frame in bvs function. This should have the same value as logT variable in the bvs function.

nlptype

Determines the type of nonlocal prior that is used in the analyses. It can be "piMOM" for product inverse moment prior, or "pMOM" for product moment prior. The default is set to piMOM prior.

train_idx

An integer vector containing the indices of the training set.

test_idx

An integer vector containing the indices of the test set. The set of observations that prediction will be performed on.

thr

The threshold used for Occam's window as explained in the description. The default value for this variable is 0.05.

times

A vector of times at which predictive AUC is to be computed. This input is only used for prediction in survival data analysis.

family

Determines the type of data analysis. logistic is for binary outcome and logistic regression model whereas, survival represents survival outcomes and the Cox proportional hazard model.

Value

The output is different based on the family for the anlysis of data 1) family = logistic The output is a list with the two following objects:

auc

This is the area under the ROC curve after Bayesian model averaging is used to obtain ROC for the test data.

roc_curve

This is a two column matrix representing points on the ROC curve and can be used to plot the curve. The first column is FPR and the second column is TPR which represent x-axis and y-axis in the ROC curve, respectively.

2) family = survival

auc

A vector with the same length as times variable showing predictive area under the curve at each given time point using Bayesian Model averaging.

Author(s)

Amir Nikooienejad

References

Raftery, A. E., Madigan, D., & Hoeting, J. A. (1997). Bayesian model averaging for linear regression models. Journal of the American Statistical Association, 92(437), 179-191.

Nikooienejad, A., Wang, W., & Johnson, V. E. (2020). Bayesian variable selection for survival data using inverse moment priors. Annals of Applied Statistics, 14(2), 809-828.

Uno, H., Cai, T., Tian, L., & Wei, L. J. (2007). Evaluating prediction rules for t-year survivors with censored regression models. Journal of the American Statistical Association, 102(478), 527-537.

Examples

### Simulating Logistic Regression Data
n <- 200
p <- 40
set.seed(123)
Sigma <- diag(p)
full <- matrix(c(rep(0.5, p*p)), ncol=p)
Sigma <- full + 0.5*Sigma
cholS <- chol(Sigma)
Beta <- c(-1.7,1.8,2.5)
X <- matrix(rnorm(n*p), ncol=p)
X <- X%*%cholS
colnames(X) <- c(paste("gene_",c(1:p),sep=""))
beta <- numeric(p)
beta[c(1:length(Beta))] <- Beta
Xout <- PreProcess(X)
X <- Xout$X
XB <- X%*%beta
probs <- as.vector(exp(XB)/(1+exp(XB)))
y <- rbinom(n,1,probs)
X <- as.data.frame(X)
train_idx <- sample(1:n,0.8*n)
test_idx <- setdiff(1:n,train_idx)
X_train <- X[train_idx,]
y_train <- y[train_idx]
bout <- bvs(X_train, y_train, prep=FALSE, family = "logistic",
           mod_prior = "beta",niter = 50)
BMAout <- predBMA(bout, X, y, prep = FALSE, logT = FALSE, 
                 train_idx = train_idx, test_idx = test_idx,
                 family="logistic")
### AUC for the prediction:
BMAout$auc

### Plotting ROC Curve
roc <- BMAout$roc_curve
plot(roc,lwd=2,type='l',col='blue')

[Package BVSNLP version 1.1.9 Index]