kfold.stanreg {rstanarm} | R Documentation |
K-fold cross-validation
Description
The kfold
method performs exact -fold cross-validation. First
the data are randomly partitioned into
subsets of equal size (or as close
to equal as possible), or the user can specify the
folds
argument
to determine the partitioning. Then the model is refit times, each time
leaving out one of the
subsets. If
is equal to the total
number of observations in the data then
-fold cross-validation is
equivalent to exact leave-one-out cross-validation (to which
loo
is an efficient approximation).
Usage
## S3 method for class 'stanreg'
kfold(
x,
K = 10,
...,
folds = NULL,
save_fits = FALSE,
cores = getOption("mc.cores", 1)
)
Arguments
x |
A fitted model object returned by one of the rstanarm modeling functions. See stanreg-objects. |
K |
For |
... |
Currently ignored. |
folds |
For |
save_fits |
For |
cores |
The number of cores to use for parallelization. Instead fitting
separate Markov chains for the same model on different cores, by default
|
Value
An object with classes 'kfold' and 'loo' that has a similar structure
as the objects returned by the loo
and waic
methods and is compatible with the loo_compare
function for
comparing models.
References
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing. 27(5), 1413–1432. doi:10.1007/s11222-016-9696-4. arXiv preprint: https://arxiv.org/abs/1507.04544
Yao, Y., Vehtari, A., Simpson, D., and Gelman, A. (2018) Using stacking to average Bayesian predictive distributions. Bayesian Analysis, advance publication, doi:10.1214/17-BA1091.
Examples
if (.Platform$OS.type != "windows" || .Platform$r_arch != "i386") {
fit1 <- stan_glm(mpg ~ wt, data = mtcars, refresh = 0)
fit2 <- stan_glm(mpg ~ wt + cyl, data = mtcars, refresh = 0)
fit3 <- stan_glm(mpg ~ disp * as.factor(cyl), data = mtcars, refresh = 0)
# 10-fold cross-validation
# (if possible also specify the 'cores' argument to use multiple cores)
(kfold1 <- kfold(fit1, K = 10))
kfold2 <- kfold(fit2, K = 10)
kfold3 <- kfold(fit3, K = 10)
loo_compare(kfold1, kfold2, kfold3)
# stratifying by a grouping variable
# (note: might get some divergences warnings with this model but
# this is just intended as a quick example of how to code this)
fit4 <- stan_lmer(mpg ~ disp + (1|cyl), data = mtcars, refresh = 0)
table(mtcars$cyl)
folds_cyl <- loo::kfold_split_stratified(K = 3, x = mtcars$cyl)
table(cyl = mtcars$cyl, fold = folds_cyl)
kfold4 <- kfold(fit4, folds = folds_cyl, cores = 2)
print(kfold4)
}
# Example code demonstrating the different ways to specify the number
# of cores and how the cores are used
#
# options(mc.cores = NULL)
#
# # spread the K models over N_CORES cores (method 1)
# kfold(fit, K, cores = N_CORES)
#
# # spread the K models over N_CORES cores (method 2)
# options(mc.cores = N_CORES)
# kfold(fit, K)
#
# # fit K models sequentially using N_CORES cores for the Markov chains each time
# options(mc.cores = N_CORES)
# kfold(fit, K, cores = 1)