predBMA {BVSNLP} | R Documentation |
Predictive accuracy measurement using Bayesian Model Averaging
Description
This function is used for predictive accuracy measurement of
the selected models using Bayesian Model Averaging methodology. The Occam's
window with cut out threshold of thr
is used. That means only models
that have posterior probability of at least thr
* posteior
probability of the highest posterior probability model are considered in
model averaging. For survival time response datasets, the predictive Area
Under Curve (AUC) at each given time point is computed as the output. In
this case, the predictive AUC is obtained using Uno's method for
observations in the test set. For binary outcome data, only one AUC is
reported which is from the ROC computed on the test set. The training set is
used to find the selected model and relevant probabilities.
Usage
predBMA(
bvsobj,
X,
resp,
prep,
logT,
nlptype = "piMOM",
train_idx,
test_idx,
thr = 0.05,
times = NULL,
family = c("logistic", "survival")
)
Arguments
bvsobj |
An object that is generated by |
X |
The same |
resp |
For logistic regression models, this variable is the binary
response vector. For the Cox proportional hazard models this is a two column
matrix where the first column contains survival times and the second column
is the censoring status for each observation. Note that for survival times,
the time section of this variable should be in the same scale and unit
(year, days, etc.) as |
prep |
A boolean variable determining if the preprocessing step has
been done on the original data frame in |
logT |
A boolean variable determining if log transform should has been
done on continuous columns of the data frame in |
nlptype |
Determines the type of nonlocal prior that is used in the analyses. It can be "piMOM" for product inverse moment prior, or "pMOM" for product moment prior. The default is set to piMOM prior. |
train_idx |
An integer vector containing the indices of the training set. |
test_idx |
An integer vector containing the indices of the test set. The set of observations that prediction will be performed on. |
thr |
The threshold used for Occam's window as explained in the description. The default value for this variable is 0.05. |
times |
A vector of times at which predictive AUC is to be computed. This input is only used for prediction in survival data analysis. |
family |
Determines the type of data analysis. |
Value
The output is different based on the family for the anlysis of data
1) family = logistic
The output is a list with the two following objects:
auc |
This is the area under the ROC curve after Bayesian model averaging is used to obtain ROC for the test data. |
roc_curve |
This is a two column matrix representing points on the ROC curve and can be used to plot the curve. The first column is FPR and the second column is TPR which represent x-axis and y-axis in the ROC curve, respectively. |
2) family = survival
auc |
A vector with the same length as |
Author(s)
Amir Nikooienejad
References
Raftery, A. E., Madigan, D., & Hoeting, J. A. (1997). Bayesian
model averaging for linear regression models. Journal of the American
Statistical Association, 92(437), 179-191.
Nikooienejad, A., Wang, W., & Johnson, V. E. (2020). Bayesian variable
selection for survival data using inverse moment priors. Annals of Applied
Statistics, 14(2), 809-828.
Uno, H., Cai, T., Tian, L., & Wei, L. J. (2007). Evaluating
prediction rules for t-year survivors with censored regression models.
Journal of the American Statistical Association, 102(478), 527-537.
Examples
### Simulating Logistic Regression Data
n <- 200
p <- 40
set.seed(123)
Sigma <- diag(p)
full <- matrix(c(rep(0.5, p*p)), ncol=p)
Sigma <- full + 0.5*Sigma
cholS <- chol(Sigma)
Beta <- c(-1.7,1.8,2.5)
X <- matrix(rnorm(n*p), ncol=p)
X <- X%*%cholS
colnames(X) <- c(paste("gene_",c(1:p),sep=""))
beta <- numeric(p)
beta[c(1:length(Beta))] <- Beta
Xout <- PreProcess(X)
X <- Xout$X
XB <- X%*%beta
probs <- as.vector(exp(XB)/(1+exp(XB)))
y <- rbinom(n,1,probs)
X <- as.data.frame(X)
train_idx <- sample(1:n,0.8*n)
test_idx <- setdiff(1:n,train_idx)
X_train <- X[train_idx,]
y_train <- y[train_idx]
bout <- bvs(X_train, y_train, prep=FALSE, family = "logistic",
mod_prior = "beta",niter = 50)
BMAout <- predBMA(bout, X, y, prep = FALSE, logT = FALSE,
train_idx = train_idx, test_idx = test_idx,
family="logistic")
### AUC for the prediction:
BMAout$auc
### Plotting ROC Curve
roc <- BMAout$roc_curve
plot(roc,lwd=2,type='l',col='blue')