folds.svydesign {surveyCV} | R Documentation |
Creating CV folds based on the svydesign
object
Description
Wrapper function which takes a svydesign
object
and desired number of CV folds,
and passes it into folds.svy
.
Returns a vector of fold IDs, which in most cases you will want to append
to your svydesign
object using update.svydesign
(see Examples below).
These fold IDs respect any stratification or clustering in the survey design.
You can then carry out K-fold CV as usual,
taking care to also use the survey design features and survey weights
when fitting models in each training set
and also when evaluating models against each test set.
Usage
folds.svydesign(design_object, nfolds)
Arguments
design_object |
Name of a |
nfolds |
Number of folds to be used during cross validation |
Details
For the special cases of linear or logistic GLMs, use instead
cv.svydesign
or cv.svyglm
which will automate the whole CV process for you.
Value
Integer vector of fold IDs with length nrow(Data)
.
Most likely you will want to append the returned vector
to the svydesign
object,
for instance with update.svydesign
(see Examples below).
See Also
cv.svy
, cv.svydesign
, or cv.svyglm
to carry out the whole CV process (not just forming folds but also training
and testing your models) for linear or logistic regression models
Examples
# Set up CV folds for a stratified sample and a one-stage cluster sample,
# using data from the `survey` package
library(survey)
data("api", package = "survey")
# stratified sample
dstrat <- svydesign(id = ~1, strata = ~stype, weights = ~pw, data = apistrat,
fpc = ~fpc)
dstrat <- update(dstrat, .foldID = folds.svydesign(dstrat, nfolds = 5))
# Each fold will have observations from every stratum
with(dstrat$variables, table(stype, .foldID))
# Fold sizes should be roughly equal
table(dstrat$variables$.foldID)
#
# one-stage cluster sample
dclus1 <- svydesign(id = ~dnum, weights = ~pw, data = apiclus1, fpc = ~fpc)
dclus1 <- update(dclus1, .foldID = folds.svydesign(dclus1, nfolds = 5))
# For any given cluster, all its observations will be in the same fold;
# and each fold should contain roughly the same number of clusters
with(dclus1$variables, table(dnum, .foldID))
# But if cluster sizes are unequal,
# the number of individuals per fold will also vary
table(dclus1$variables$.foldID)
# See the end of `intro` vignette for an example of using such folds
# as part of a custom loop over CV folds
# to tune parameters in a design-consistent random forest model