R: Variable selection with NFT BART models.

tsvs2 {nftbart}

R Documentation

Variable selection with NFT BART models.

Description

The tsvs2()/tsvs() function is for Thompson sampling variable selection with NFT BART.

Usage

tsvs2(
               ## data
               xftrain, xstrain, times, delta=NULL, 
               rm.const=TRUE, rm.dupe=TRUE,
               ##tsvs args
               K=20, a.=1, b.=0.5, C=0.5,
               rds.file='tsvs2.rds', pdf.file='tsvs2.pdf',
               ## multi-threading
               tc=getOption("mc.cores", 1), ##OpenMP thread count
               ##MCMC
               nskip=1000, ndpost=2000, 
               nadapt=1000, adaptevery=100, 
               chvf=NULL, chvs=NULL,
               method="spearman", use="pairwise.complete.obs",
               pbd=c(0.7, 0.7), pb=c(0.5, 0.5),
               stepwpert=c(0.1, 0.1), probchv=c(0.1, 0.1),
               minnumbot=c(5, 5),
               ## BART and HBART prior parameters
               ntree=c(10, 2), numcut=100,
               xifcuts=NULL, xiscuts=NULL,
               power=c(2, 2), base=c(0.95, 0.95),
               ## f function
               fmu=NA, k=5, tau=NA, dist='weibull', 
               ## s function
               total.lambda=NA, total.nu=10, mask=0.95,
               ## survival analysis 
               ##K=100, events=NULL, 
               ## DPM LIO
               drawDPM=1L, 
               alpha=1, alpha.a=1, alpha.b=0.1, alpha.draw=1,
               neal.m=2, constrain=1, 
               m0=0, k0.a=1.5, k0.b=7.5, k0=1, k0.draw=1,
               a0=3, b0.a=2, b0.b=1, b0=1, b0.draw=1,
               ## misc
               na.rm=FALSE, probs=c(0.025, 0.975), printevery=100,
               transposed=FALSE
)

tsvs(
               ## data
               x.train, times, delta=NULL, 
               rm.const=TRUE, rm.dupe=TRUE,
               ##tsvs args
               K=20, a.=1, b.=0.5, C=0.5,
               rds.file='tsvs.rds', pdf.file='tsvs.pdf',
               ## multi-threading
               tc=getOption("mc.cores", 1), ##OpenMP thread count
               ##MCMC
               nskip=1000, ndpost=2000, 
               nadapt=1000, adaptevery=100, 
               chv=NULL,
               method="spearman", use="pairwise.complete.obs",
               pbd=c(0.7, 0.7), pb=c(0.5, 0.5),
               stepwpert=c(0.1, 0.1), probchv=c(0.1, 0.1),
               minnumbot=c(5, 5),
               ## BART and HBART prior parameters
               ntree=c(10, 2), numcut=100, xicuts=NULL,
               power=c(2, 2), base=c(0.95, 0.95),
               ## f function
               fmu=NA, k=5, tau=NA, dist='weibull', 
               ## s function
               total.lambda=NA, total.nu=10, mask=0.95,
               ## survival analysis 
               ##K=100, events=NULL, 
               ## DPM LIO
               drawDPM=1L, 
               alpha=1, alpha.a=1, alpha.b=0.1, alpha.draw=1,
               neal.m=2, constrain=1, 
               m0=0, k0.a=1.5, k0.b=7.5, k0=1, k0.draw=1,
               a0=3, b0.a=2, b0.b=1, b0=1, b0.draw=1,
               ## misc
               na.rm=FALSE, probs=c(0.025, 0.975), printevery=100,
               transposed=FALSE
)

Arguments

`xftrain`	n x pf matrix of predictor variables for the training data.
`xstrain`	n x ps matrix of predictor variables for the training data.
`x.train`	n x ps matrix of predictor variables for the training data.
`times`	nx1 vector of the observed times for the training data.
`delta`	nx1 vector of the time type for the training data: 0, for right-censoring; 1, for an event; and, 2, for left-censoring.
`rm.const`	To remove constant variables or not.
`rm.dupe`	To remove duplicate variables or not.
`K`	The number of Thompson sampling steps to take. Not to be confused with the size of the time grid for survival distribution estimation.
`a.`	The prior parameter for successes of a Beta distribution.
`b.`	The prior parameter for failures of a Beta distribution.
`C`	The probability cut-off for variable selection.
`rds.file`	File name to store RDS object containing Thompson sampling parameters.
`pdf.file`	File name to store PDF graphic of variables selected.
`tc`	Number of OpenMP threads to use.
`nskip`	Number of MCMC iterations to burn-in and discard.
`ndpost`	Number of MCMC iterations kept after burn-in.
`nadapt`	Number of MCMC iterations for adaptation prior to burn-in.
`adaptevery`	Adapt MCMC proposal distributions every `adaptevery` iteration.
`chvf`, `chvs`, `chv`	Predictor correlation matrix used as a pre-conditioner for MCMC change-of-variable proposals.
`method`, `use`	Correlation options for change-of-variable proposal pre-conditioner.
`pbd`	Probability of performing a birth/death proposal, otherwise perform a rotate proposal.
`pb`	Probability of performing a birth proposal given that we choose to perform a birth/death proposal.
`stepwpert`	Initial width of proposal distribution for peturbing cut-points.
`probchv`	Probability of performing a change-of-variable proposal. Otherwise, only do a perturb proposal.
`minnumbot`	Minimum number of observations required in leaf (terminal) nodes.
`ntree`	Vector of length two for the number of trees used for the mean model and the number of trees used for the variance model.
`numcut`	Number of cutpoints to use for each predictor variable.
`xifcuts`, `xiscuts`, `xicuts`	More detailed construction of cut-points can be specified by the `xicuts` function and provided here.
`power`	Power parameter in the tree depth penalizing prior.
`base`	Base parameter in the tree depth penalizing prior.
`fmu`	Prior parameter for the center of the mean model.
`k`	Prior parameter for the mean model.
`tau`	Desired `SD/ntree` for f function leaf prior if known.
`dist`	Distribution to be passed to intercept-only AFT model to center `y.train`.
`total.lambda`	A rudimentary estimate of the process standard deviation. Used in calibrating the variance prior.
`total.nu`	Shape parameter for the variance prior.
`mask`	If a proportion is provided, then said quantile of `max.i sd(x.i)` is used to mask non-stationary departures (with respect to convergence) above this threshold.
`drawDPM`	Whether to utilize DPM or not.
`alpha`	Initial value of DPM concentration parameter.
`alpha.a`	Gamma prior parameter setting for DPM concentration parameter where E[`alpha`]=`alpha.a`/`alpha.b`.
`alpha.b`	See `alpha.a` above.
`alpha.draw`	Whether to draw `alpha` or it is fixed at the initial value.
`neal.m`	The number of additional atoms for Neal 2000 DPM algorithm 8.
`constrain`	Whether to perform constained DPM or unconstrained.
`m0`	Center of the error distribution: defaults to zero.
`k0.a`	First Gamma prior argument for `k0`.
`k0.b`	Second Gamma prior argument for `k0`.
`k0`	Initial value of `k0`.
`k0.draw`	Whether to fix k0 or draw it if from the DPM LIO prior hierarchy: `k0~Gamma(k0.a, k0.b)`, i.e., `E[k0]=k0.a/k0.b`.
`a0`	First Gamma prior argument for `tau`.
`b0.a`	First Gamma prior argument for `b0`.
`b0.b`	Second Gamma prior argument for `b0`.
`b0`	Initial value of `b0`.
`b0.draw`	Whether to fix b0 or draw it from the DPM LIO prior hierarchy: `b0~Gamma(b0.a, b0.b)`, i.e., `E[b0]=b0.a/b0.b`.
`na.rm`	Value to be passed to the `predict` function.
`probs`	Value to be passed to the `predict` function.
`printevery`	Outputs MCMC algorithm status every printevery iterations.
`transposed`	`tsvs` handles all of the pre-processing for `x.train/x.test` (including tranposing) computational efficiency.

Details

tsvs2()/tsvs() is the function to perform variable selection. The tsvs2()/tsvs() function returns a fit object of S3 class type list as well as storing it in rds.file for sampling in progress.

Author(s)

Rodney Sparapani: rsparapa@mcw.edu

References

Sparapani R., Logan B., Maiers M., Laud P., McCulloch R. (2023) Nonparametric Failure Time: Time-to-event Machine Learning with Heteroskedastic Bayesian Additive Regression Trees and Low Information Omnibus Dirichlet Process Mixtures Biometrics (ahead of print) <doi:10.1111/biom.13857>.

Liu Y., Rockova V. (2021) Variable selection via Thompson sampling. Journal of the American Statistical Association. Jun 29:1-8.

Examples


##library(nftbart)
data(lung)
N=length(lung$status)

##lung$status: 1=censored, 2=dead
##delta: 0=censored, 1=dead
delta=lung$status-1

## this study reports time in days rather than weeks or months
times=lung$time
times=times/7  ## weeks

## matrix of covariates
x.train=cbind(lung[ , -(1:3)])
## lung$sex:        Male=1 Female=2


##vars=tsvs2(x.train, x.train, times, delta)
vars=tsvs2(x.train, x.train, times, delta, K=0) ## K=0 just returns 0

[Package nftbart version 2.1 Index]