nested_cv {rsample} | R Documentation |
Nested or Double Resampling
Description
nested_cv
can be used to take the results of one resampling procedure
and conduct further resamples within each split. Any type of resampling
used in rsample
can be used.
Usage
nested_cv(data, outside, inside)
Arguments
data |
A data frame. |
outside |
The initial resampling specification. This can be an already
created object or an expression of a new object (see the examples below).
If the latter is used, the |
inside |
An expression for the type of resampling to be conducted within the initial procedure. |
Details
It is a bad idea to use bootstrapping as the outer resampling procedure (see the example below)
Value
An tibble with nested_cv
class and any other classes that
outer resampling process normally contains. The results include a
column for the outer data split objects, one or more id
columns,
and a column of nested tibbles called inner_resamples
with the
additional resamples.
Examples
## Using expressions for the resampling procedures:
nested_cv(mtcars, outside = vfold_cv(v = 3), inside = bootstraps(times = 5))
## Using an existing object:
folds <- vfold_cv(mtcars)
nested_cv(mtcars, folds, inside = bootstraps(times = 5))
## The dangers of outer bootstraps:
set.seed(2222)
bad_idea <- nested_cv(mtcars,
outside = bootstraps(times = 5),
inside = vfold_cv(v = 3)
)
first_outer_split <- bad_idea$splits[[1]]
outer_analysis <- as.data.frame(first_outer_split)
sum(grepl("Volvo 142E", rownames(outer_analysis)))
## For the 3-fold CV used inside of each bootstrap, how are the replicated
## `Volvo 142E` data partitioned?
first_inner_split <- bad_idea$inner_resamples[[1]]$splits[[1]]
inner_analysis <- as.data.frame(first_inner_split)
inner_assess <- as.data.frame(first_inner_split, data = "assessment")
sum(grepl("Volvo 142E", rownames(inner_analysis)))
sum(grepl("Volvo 142E", rownames(inner_assess)))