model.hsstan {nestedcv} | R Documentation |
hsstan model for cross-validation
Description
This function applies a cross-validation (CV) procedure for training Bayesian
models with hierarchical shrinkage priors using the hsstan
package. The
function allows the option of embedded filtering of predictors for feature
selection within the CV loop. Within each training fold, an optional
filtering of predictors is performed, followed by fitting of an hsstsan
model. Predictions on the testing folds are brought back together and error
estimation/ accuracy determined. The default is 10-fold CV. The function is
implemented within the nestedcv
package. The hsstan
models do not require
tuning of meta-parameters and therefore only a single CV procedure is needed
to evaluate performance. This is implemented using the outer
CV procedure
in the nestedcv
package. Supports binary outcome (logistic regression) or
continuous outcome. Multinomial models are currently not supported.
Usage
model.hsstan(y, x, unpenalized = NULL, ...)
Arguments
y |
Response vector. For classification this should be a factor. |
x |
Matrix of predictors |
unpenalized |
Vector of column names |
... |
Optional arguments passed to |
Details
Caution should be used when setting the number of cores available for
parallelisation. The default setting in hsstan
is to use 4 cores to
parallelise the Markov chains of the Bayesian inference procedure. This can
be switched off either by adding argument cores = 1
(passed on to rstan
)
or setting options(mc.cores = 1)
.
Argument cv.cores
in outercv()
controls parallelisation over the outer CV
folds. On unix/mac setting cv.cores
to >1 will induce nested
parallelisation which will generate an error, unless parallelisation of the
chains is disabled using cores = 1
or setting options(mc.cores = 1)
.
Nested parallelisation is feasible if cv.cores
is >1 and
multicore_fork = FALSE
is set as this uses cluster based parallelisation
instead. Beware that large numbers of processes will be spawned. If we are
performing 10-fold cross-validation with 4 chains and set cv.cores = 10
then 40 processes will be invoked simultaneously.
Value
An object of class hsstan
Author(s)
Athina Spiliopoulou
See Also
Examples
# Cross-validation is used to apply univariate filtering of predictors.
# only one CV split is needed (outercv) as the Bayesian model does not
# require learning of meta-parameters.
# control number of cores used for parallelisation over chains
oldopt <- options(mc.cores = 2)
# load iris dataset and simulate a continuous outcome
data(iris)
dt <- iris[, 1:4]
colnames(dt) <- c("marker1", "marker2", "marker3", "marker4")
dt <- as.data.frame(apply(dt, 2, scale))
dt$outcome.cont <- -3 + 0.5 * dt$marker1 + 2 * dt$marker2 + rnorm(nrow(dt), 0, 2)
library(hsstan)
# unpenalised covariates: always retain in the prediction model
uvars <- "marker1"
# penalised covariates: coefficients are drawn from hierarchical shrinkage
# prior
pvars <- c("marker2", "marker3", "marker4") # penalised covariates
# run cross-validation with univariate filter and hsstan
# dummy sampling for fast execution of example
# recommend 4 chains, warmup 1000, iter 2000 in practice
res.cv.hsstan <- outercv(y = dt$outcome.cont, x = dt[, c(uvars, pvars)],
model = "model.hsstan",
filterFUN = lm_filter,
filter_options = list(force_vars = uvars,
nfilter = 2,
p_cutoff = NULL,
rsq_cutoff = 0.9),
n_outer_folds = 3,
chains = 2,
cv.cores = 1,
unpenalized = uvars, warmup = 100, iter = 200)
# view prediction performance based on testing folds
res.cv.hsstan$summary
# view coefficients for the final model
res.cv.hsstan$final_fit
# view covariates selected by the univariate filter
res.cv.hsstan$final_vars
# use hsstan package to examine the Bayesian model
sampler.stats(res.cv.hsstan$final_fit)
print(projsel(res.cv.hsstan$final_fit), digits = 4) # adding marker2
options(oldopt) # reset configuation
# Here adding `marker2` improves the model fit: substantial decrease of
# KL-divergence from the full model to the submodel. Adding `marker3` does
# not improve the model fit: no decrease of KL-divergence from the full model
# to the submodel.