tsvs2 {nftbart} | R Documentation |
Variable selection with NFT BART models.
Description
The tsvs2()/tsvs()
function is for Thompson sampling
variable selection with NFT BART.
Usage
tsvs2(
## data
xftrain, xstrain, times, delta=NULL,
rm.const=TRUE, rm.dupe=TRUE,
##tsvs args
K=20, a.=1, b.=0.5, C=0.5,
rds.file='tsvs2.rds', pdf.file='tsvs2.pdf',
## multi-threading
tc=getOption("mc.cores", 1), ##OpenMP thread count
##MCMC
nskip=1000, ndpost=2000,
nadapt=1000, adaptevery=100,
chvf=NULL, chvs=NULL,
method="spearman", use="pairwise.complete.obs",
pbd=c(0.7, 0.7), pb=c(0.5, 0.5),
stepwpert=c(0.1, 0.1), probchv=c(0.1, 0.1),
minnumbot=c(5, 5),
## BART and HBART prior parameters
ntree=c(10, 2), numcut=100,
xifcuts=NULL, xiscuts=NULL,
power=c(2, 2), base=c(0.95, 0.95),
## f function
fmu=NA, k=5, tau=NA, dist='weibull',
## s function
total.lambda=NA, total.nu=10, mask=0.95,
## survival analysis
##K=100, events=NULL,
## DPM LIO
drawDPM=1L,
alpha=1, alpha.a=1, alpha.b=0.1, alpha.draw=1,
neal.m=2, constrain=1,
m0=0, k0.a=1.5, k0.b=7.5, k0=1, k0.draw=1,
a0=3, b0.a=2, b0.b=1, b0=1, b0.draw=1,
## misc
na.rm=FALSE, probs=c(0.025, 0.975), printevery=100,
transposed=FALSE
)
tsvs(
## data
x.train, times, delta=NULL,
rm.const=TRUE, rm.dupe=TRUE,
##tsvs args
K=20, a.=1, b.=0.5, C=0.5,
rds.file='tsvs.rds', pdf.file='tsvs.pdf',
## multi-threading
tc=getOption("mc.cores", 1), ##OpenMP thread count
##MCMC
nskip=1000, ndpost=2000,
nadapt=1000, adaptevery=100,
chv=NULL,
method="spearman", use="pairwise.complete.obs",
pbd=c(0.7, 0.7), pb=c(0.5, 0.5),
stepwpert=c(0.1, 0.1), probchv=c(0.1, 0.1),
minnumbot=c(5, 5),
## BART and HBART prior parameters
ntree=c(10, 2), numcut=100, xicuts=NULL,
power=c(2, 2), base=c(0.95, 0.95),
## f function
fmu=NA, k=5, tau=NA, dist='weibull',
## s function
total.lambda=NA, total.nu=10, mask=0.95,
## survival analysis
##K=100, events=NULL,
## DPM LIO
drawDPM=1L,
alpha=1, alpha.a=1, alpha.b=0.1, alpha.draw=1,
neal.m=2, constrain=1,
m0=0, k0.a=1.5, k0.b=7.5, k0=1, k0.draw=1,
a0=3, b0.a=2, b0.b=1, b0=1, b0.draw=1,
## misc
na.rm=FALSE, probs=c(0.025, 0.975), printevery=100,
transposed=FALSE
)
Arguments
xftrain |
n x pf matrix of predictor variables for the training data. |
xstrain |
n x ps matrix of predictor variables for the training data. |
x.train |
n x ps matrix of predictor variables for the training data. |
times |
nx1 vector of the observed times for the training data. |
delta |
nx1 vector of the time type for the training data: 0, for right-censoring; 1, for an event; and, 2, for left-censoring. |
rm.const |
To remove constant variables or not. |
rm.dupe |
To remove duplicate variables or not. |
K |
The number of Thompson sampling steps to take. Not to be confused with the size of the time grid for survival distribution estimation. |
a. |
The prior parameter for successes of a Beta distribution. |
b. |
The prior parameter for failures of a Beta distribution. |
C |
The probability cut-off for variable selection. |
rds.file |
File name to store RDS object containing Thompson sampling parameters. |
pdf.file |
File name to store PDF graphic of variables selected. |
tc |
Number of OpenMP threads to use. |
nskip |
Number of MCMC iterations to burn-in and discard. |
ndpost |
Number of MCMC iterations kept after burn-in. |
nadapt |
Number of MCMC iterations for adaptation prior to burn-in. |
adaptevery |
Adapt MCMC proposal distributions every |
chvf , chvs , chv |
Predictor correlation matrix used as a pre-conditioner for MCMC change-of-variable proposals. |
method , use |
Correlation options for change-of-variable proposal pre-conditioner. |
pbd |
Probability of performing a birth/death proposal, otherwise perform a rotate proposal. |
pb |
Probability of performing a birth proposal given that we choose to perform a birth/death proposal. |
stepwpert |
Initial width of proposal distribution for peturbing cut-points. |
probchv |
Probability of performing a change-of-variable proposal. Otherwise, only do a perturb proposal. |
minnumbot |
Minimum number of observations required in leaf (terminal) nodes. |
ntree |
Vector of length two for the number of trees used for the mean model and the number of trees used for the variance model. |
numcut |
Number of cutpoints to use for each predictor variable. |
xifcuts , xiscuts , xicuts |
More detailed construction of cut-points can be specified
by the |
power |
Power parameter in the tree depth penalizing prior. |
base |
Base parameter in the tree depth penalizing prior. |
fmu |
Prior parameter for the center of the mean model. |
k |
Prior parameter for the mean model. |
tau |
Desired |
dist |
Distribution to be passed to intercept-only AFT model to center |
total.lambda |
A rudimentary estimate of the process standard deviation. Used in calibrating the variance prior. |
total.nu |
Shape parameter for the variance prior. |
mask |
If a proportion is provided, then said quantile
of |
drawDPM |
Whether to utilize DPM or not. |
alpha |
Initial value of DPM concentration parameter. |
alpha.a |
Gamma prior parameter setting for DPM concentration parameter
where E[ |
alpha.b |
See |
alpha.draw |
Whether to draw |
neal.m |
The number of additional atoms for Neal 2000 DPM algorithm 8. |
constrain |
Whether to perform constained DPM or unconstrained. |
m0 |
Center of the error distribution: defaults to zero. |
k0.a |
First Gamma prior argument for |
k0.b |
Second Gamma prior argument for |
k0 |
Initial value of |
k0.draw |
Whether to fix k0 or draw it if from the DPM LIO prior
hierarchy: |
a0 |
First Gamma prior argument for |
b0.a |
First Gamma prior argument for |
b0.b |
Second Gamma prior argument for |
b0 |
Initial value of |
b0.draw |
Whether to fix b0 or draw it from the DPM LIO prior
hierarchy: |
na.rm |
Value to be passed to the |
probs |
Value to be passed to the |
printevery |
Outputs MCMC algorithm status every printevery iterations. |
transposed |
|
Details
tsvs2()/tsvs()
is the function to perform variable selection.
The tsvs2()/tsvs()
function returns a fit object of S3 class type
list
as well as storing it in rds.file
for
sampling in progress.
Author(s)
Rodney Sparapani: rsparapa@mcw.edu
References
Sparapani R., Logan B., Maiers M., Laud P., McCulloch R. (2023) Nonparametric Failure Time: Time-to-event Machine Learning with Heteroskedastic Bayesian Additive Regression Trees and Low Information Omnibus Dirichlet Process Mixtures Biometrics (ahead of print) <doi:10.1111/biom.13857>.
Liu Y., Rockova V. (2021) Variable selection via Thompson sampling. Journal of the American Statistical Association. Jun 29:1-8.
See Also
Examples
##library(nftbart)
data(lung)
N=length(lung$status)
##lung$status: 1=censored, 2=dead
##delta: 0=censored, 1=dead
delta=lung$status-1
## this study reports time in days rather than weeks or months
times=lung$time
times=times/7 ## weeks
## matrix of covariates
x.train=cbind(lung[ , -(1:3)])
## lung$sex: Male=1 Female=2
##vars=tsvs2(x.train, x.train, times, delta)
vars=tsvs2(x.train, x.train, times, delta, K=0) ## K=0 just returns 0