kfold.brmsfit {brms}  R Documentation 
KFold CrossValidation
Description
Perform exact Kfold crossvalidation by refitting the model K
times each leaving out oneK
th of the original data.
Folds can be run in parallel using the future package.
Usage
## S3 method for class 'brmsfit'
kfold(
x,
...,
K = 10,
Ksub = NULL,
folds = NULL,
group = NULL,
joint = FALSE,
compare = TRUE,
resp = NULL,
model_names = NULL,
save_fits = FALSE,
recompile = NULL,
future_args = list()
)
Arguments
x 
A 
... 
Further arguments passed to 
K 
The number of subsets of equal (if possible) size
into which the data will be partitioned for performing

Ksub 
Optional number of subsets (of those subsets defined by 
folds 
Determines how the subsets are being constructed.
Possible values are 
group 
Optional name of a grouping variable or factor in the model.
What exactly is done with this variable depends on argument 
joint 
Indicates which observations' log likelihoods shall be
considered jointly in the ELPD computation. If 
compare 
A flag indicating if the information criteria
of the models should be compared to each other
via 
resp 
Optional names of response variables. If specified, predictions are performed only for the specified response variables. 
model_names 
If 
save_fits 
If 
recompile 
Logical, indicating whether the Stan model should be
recompiled. This may be necessary if you are running 
future_args 
A list of further arguments passed to

Details
The kfold
function performs exact K
fold
crossvalidation. First the data are partitioned into K
folds
(i.e. subsets) of equal (or as close to equal as possible) size by default.
Then the model is refit K
times, each time leaving out one of the
K
subsets. If K
is equal to the total number of observations
in the data then K
fold crossvalidation is equivalent to exact
leaveoneout crossvalidation (to which loo
is an efficient
approximation). The compare_ic
function is also compatible with
the objects returned by kfold
.
The subsets can be constructed in multiple different ways:
If both
folds
andgroup
areNULL
, the subsets are randomly chosen so that they have equal (or as close to equal as possible) size.If
folds
isNULL
butgroup
is specified, the data is split up into subsets, each time omitting all observations of one of the factor levels, while ignoring argumentK
.If
folds = "stratified"
the subsets are stratified aftergroup
usingloo::kfold_split_stratified
.If
folds = "grouped"
the subsets are split bygroup
usingloo::kfold_split_grouped
.If
folds = "loo"
exact leaveoneout crossvalidation will be performed andK
will be ignored. Further, ifgroup
is specified, all observations corresponding to the factor level of the currently predicted single value are omitted. Thus, in this case, the predicted values are only a subset of the omitted ones.If
folds
is a numeric vector, it must contain one element per observation in the data. Each element of the vector is an integer in1:K
indicating to which of theK
folds the corresponding observation belongs. There are some convenience functions available in the loo package that create integer vectors to use for this purpose (see the Examples section below and also the kfoldhelpers page).
When running kfold
on a brmsfit
created with the
cmdstanr backend in a different R session, several recompilations
will be triggered because by default, cmdstanr writes the model
executable to a temporary directory. To avoid that, set option
"cmdstanr_write_stan_file_dir"
to a nontemporary path of your choice
before creating the original brmsfit
(see section 'Examples' below).
Value
kfold
returns an object that has a similar structure as the
objects returned by the loo
and waic
methods and
can be used with the same postprocessing functions.
See Also
Examples
## Not run:
fit1 < brm(count ~ zAge + zBase * Trt + (1patient) + (1obs),
data = epilepsy, family = poisson())
# throws warning about some pareto k estimates being too high
(loo1 < loo(fit1))
# perform 10fold cross validation
(kfold1 < kfold(fit1, chains = 1))
# use joint likelihoods per fold for ELPD evaluation
kfold(fit1, chains = 1, joint = "fold")
# use the future package for parallelization of models
# that is to fit models belonging to different folds in parallel
library(future)
plan(multisession, workers = 4)
kfold(fit1, chains = 1)
plan(sequential)
## to avoid recompilations when running kfold() on a 'cmdstanr'backend fit
## in a fresh R session, set option 'cmdstanr_write_stan_file_dir' before
## creating the initial 'brmsfit'
## CAUTION: the following code creates some files in the current working
## directory: two 'model_<hash>.stan' files, one 'model_<hash>(.exe)'
## executable, and one 'fit_cmdstanr_<some_number>.rds' file
set.seed(7)
fname < paste0("fit_cmdstanr_", sample.int(.Machine$integer.max, 1))
options(cmdstanr_write_stan_file_dir = getwd())
fit_cmdstanr < brm(rate ~ conc + state, data = Puromycin,
backend = "cmdstanr", file = fname)
# now restart the R session and run the following (after attaching 'brms')
set.seed(7)
fname < paste0("fit_cmdstanr_", sample.int(.Machine$integer.max, 1))
fit_cmdstanr < brm(rate ~ conc + state,
data = Puromycin,
backend = "cmdstanr",
file = fname)
kfold_cmdstanr < kfold(fit_cmdstanr, K = 2)
## End(Not run)