tsvs2 {nftbart}R Documentation

Variable selection with NFT BART models.

Description

The tsvs2()/tsvs() function is for Thompson sampling variable selection with NFT BART.

Usage

tsvs2(
               ## data
               xftrain, xstrain, times, delta=NULL, 
               rm.const=TRUE, rm.dupe=TRUE,
               ##tsvs args
               K=20, a.=1, b.=0.5, C=0.5,
               rds.file='tsvs2.rds', pdf.file='tsvs2.pdf',
               ## multi-threading
               tc=getOption("mc.cores", 1), ##OpenMP thread count
               ##MCMC
               nskip=1000, ndpost=2000, 
               nadapt=1000, adaptevery=100, 
               chvf=NULL, chvs=NULL,
               method="spearman", use="pairwise.complete.obs",
               pbd=c(0.7, 0.7), pb=c(0.5, 0.5),
               stepwpert=c(0.1, 0.1), probchv=c(0.1, 0.1),
               minnumbot=c(5, 5),
               ## BART and HBART prior parameters
               ntree=c(10, 2), numcut=100,
               xifcuts=NULL, xiscuts=NULL,
               power=c(2, 2), base=c(0.95, 0.95),
               ## f function
               fmu=NA, k=5, tau=NA, dist='weibull', 
               ## s function
               total.lambda=NA, total.nu=10, mask=0.95,
               ## survival analysis 
               ##K=100, events=NULL, 
               ## DPM LIO
               drawDPM=1L, 
               alpha=1, alpha.a=1, alpha.b=0.1, alpha.draw=1,
               neal.m=2, constrain=1, 
               m0=0, k0.a=1.5, k0.b=7.5, k0=1, k0.draw=1,
               a0=3, b0.a=2, b0.b=1, b0=1, b0.draw=1,
               ## misc
               na.rm=FALSE, probs=c(0.025, 0.975), printevery=100,
               transposed=FALSE
)

tsvs(
               ## data
               x.train, times, delta=NULL, 
               rm.const=TRUE, rm.dupe=TRUE,
               ##tsvs args
               K=20, a.=1, b.=0.5, C=0.5,
               rds.file='tsvs.rds', pdf.file='tsvs.pdf',
               ## multi-threading
               tc=getOption("mc.cores", 1), ##OpenMP thread count
               ##MCMC
               nskip=1000, ndpost=2000, 
               nadapt=1000, adaptevery=100, 
               chv=NULL,
               method="spearman", use="pairwise.complete.obs",
               pbd=c(0.7, 0.7), pb=c(0.5, 0.5),
               stepwpert=c(0.1, 0.1), probchv=c(0.1, 0.1),
               minnumbot=c(5, 5),
               ## BART and HBART prior parameters
               ntree=c(10, 2), numcut=100, xicuts=NULL,
               power=c(2, 2), base=c(0.95, 0.95),
               ## f function
               fmu=NA, k=5, tau=NA, dist='weibull', 
               ## s function
               total.lambda=NA, total.nu=10, mask=0.95,
               ## survival analysis 
               ##K=100, events=NULL, 
               ## DPM LIO
               drawDPM=1L, 
               alpha=1, alpha.a=1, alpha.b=0.1, alpha.draw=1,
               neal.m=2, constrain=1, 
               m0=0, k0.a=1.5, k0.b=7.5, k0=1, k0.draw=1,
               a0=3, b0.a=2, b0.b=1, b0=1, b0.draw=1,
               ## misc
               na.rm=FALSE, probs=c(0.025, 0.975), printevery=100,
               transposed=FALSE
)

Arguments

xftrain

n x pf matrix of predictor variables for the training data.

xstrain

n x ps matrix of predictor variables for the training data.

x.train

n x ps matrix of predictor variables for the training data.

times

nx1 vector of the observed times for the training data.

delta

nx1 vector of the time type for the training data: 0, for right-censoring; 1, for an event; and, 2, for left-censoring.

rm.const

To remove constant variables or not.

rm.dupe

To remove duplicate variables or not.

K

The number of Thompson sampling steps to take. Not to be confused with the size of the time grid for survival distribution estimation.

a.

The prior parameter for successes of a Beta distribution.

b.

The prior parameter for failures of a Beta distribution.

C

The probability cut-off for variable selection.

rds.file

File name to store RDS object containing Thompson sampling parameters.

pdf.file

File name to store PDF graphic of variables selected.

tc

Number of OpenMP threads to use.

nskip

Number of MCMC iterations to burn-in and discard.

ndpost

Number of MCMC iterations kept after burn-in.

nadapt

Number of MCMC iterations for adaptation prior to burn-in.

adaptevery

Adapt MCMC proposal distributions every adaptevery iteration.

chvf, chvs, chv

Predictor correlation matrix used as a pre-conditioner for MCMC change-of-variable proposals.

method, use

Correlation options for change-of-variable proposal pre-conditioner.

pbd

Probability of performing a birth/death proposal, otherwise perform a rotate proposal.

pb

Probability of performing a birth proposal given that we choose to perform a birth/death proposal.

stepwpert

Initial width of proposal distribution for peturbing cut-points.

probchv

Probability of performing a change-of-variable proposal. Otherwise, only do a perturb proposal.

minnumbot

Minimum number of observations required in leaf (terminal) nodes.

ntree

Vector of length two for the number of trees used for the mean model and the number of trees used for the variance model.

numcut

Number of cutpoints to use for each predictor variable.

xifcuts, xiscuts, xicuts

More detailed construction of cut-points can be specified by the xicuts function and provided here.

power

Power parameter in the tree depth penalizing prior.

base

Base parameter in the tree depth penalizing prior.

fmu

Prior parameter for the center of the mean model.

k

Prior parameter for the mean model.

tau

Desired SD/ntree for f function leaf prior if known.

dist

Distribution to be passed to intercept-only AFT model to center y.train.

total.lambda

A rudimentary estimate of the process standard deviation. Used in calibrating the variance prior.

total.nu

Shape parameter for the variance prior.

mask

If a proportion is provided, then said quantile of max.i sd(x.i) is used to mask non-stationary departures (with respect to convergence) above this threshold.

drawDPM

Whether to utilize DPM or not.

alpha

Initial value of DPM concentration parameter.

alpha.a

Gamma prior parameter setting for DPM concentration parameter where E[alpha]=alpha.a/alpha.b.

alpha.b

See alpha.a above.

alpha.draw

Whether to draw alpha or it is fixed at the initial value.

neal.m

The number of additional atoms for Neal 2000 DPM algorithm 8.

constrain

Whether to perform constained DPM or unconstrained.

m0

Center of the error distribution: defaults to zero.

k0.a

First Gamma prior argument for k0.

k0.b

Second Gamma prior argument for k0.

k0

Initial value of k0.

k0.draw

Whether to fix k0 or draw it if from the DPM LIO prior hierarchy: k0~Gamma(k0.a, k0.b), i.e., E[k0]=k0.a/k0.b.

a0

First Gamma prior argument for tau.

b0.a

First Gamma prior argument for b0.

b0.b

Second Gamma prior argument for b0.

b0

Initial value of b0.

b0.draw

Whether to fix b0 or draw it from the DPM LIO prior hierarchy: b0~Gamma(b0.a, b0.b), i.e., E[b0]=b0.a/b0.b.

na.rm

Value to be passed to the predict function.

probs

Value to be passed to the predict function.

printevery

Outputs MCMC algorithm status every printevery iterations.

transposed

tsvs handles all of the pre-processing for x.train/x.test (including tranposing) computational efficiency.

Details

tsvs2()/tsvs() is the function to perform variable selection. The tsvs2()/tsvs() function returns a fit object of S3 class type list as well as storing it in rds.file for sampling in progress.

Author(s)

Rodney Sparapani: rsparapa@mcw.edu

References

Sparapani R., Logan B., Maiers M., Laud P., McCulloch R. (2023) Nonparametric Failure Time: Time-to-event Machine Learning with Heteroskedastic Bayesian Additive Regression Trees and Low Information Omnibus Dirichlet Process Mixtures Biometrics (ahead of print) <doi:10.1111/biom.13857>.

Liu Y., Rockova V. (2021) Variable selection via Thompson sampling. Journal of the American Statistical Association. Jun 29:1-8.

See Also

tsvs

Examples


##library(nftbart)
data(lung)
N=length(lung$status)

##lung$status: 1=censored, 2=dead
##delta: 0=censored, 1=dead
delta=lung$status-1

## this study reports time in days rather than weeks or months
times=lung$time
times=times/7  ## weeks

## matrix of covariates
x.train=cbind(lung[ , -(1:3)])
## lung$sex:        Male=1 Female=2


##vars=tsvs2(x.train, x.train, times, delta)
vars=tsvs2(x.train, x.train, times, delta, K=0) ## K=0 just returns 0


[Package nftbart version 2.1 Index]