R: Cross-Validation Data Splitter

lol.xval.split {lolR}

R Documentation

Cross-Validation Data Splitter

Description

A function to split a dataset into training and testing sets for cross validation. The procedure for cross-validation is to split the data into k-folds. The k-folds are then rotated individually to form a single held-out testing set the model will be validated on, and the remaining (k-1) folds are used for training the developed model. Note that this cross-validation function includes functionality to be used for low-rank cross-validation. In that case, instead of using the full (k-1) folds for training, we subset min((k-1)/k*n, d) samples to ensure that the resulting training sets are all low-rank. We still rotate properly over the held-out fold to ensure that the resulting testing sets do not have any shared examples, which would add a complicated dependence structure to inference we attempt to infer on the testing sets.

Usage

lol.xval.split(X, Y, k = "loo", rank.low = FALSE, ...)

Arguments

`X`	`[n, d]` the data with `n` samples in `d` dimensions.
`Y`	`[n]` the labels of the samples with `K` unique labels.
`k`	the cross-validated method to perform. Defaults to `'loo'`. if `k == round(k)`, performed k-fold cross-validation. if `k == 'loo'`, performs leave-one-out cross-validation.
`rank.low`	whether to force the training set to low-rank. Defaults to `FALSE`. if `rank == FALSE`, uses default cross-validation method with standard `k`-fold validation. Training sets are `k-1` folds, and testing sets are `1` fold, where the fold held-out for testing is rotated to ensure no dependence of potential downstream inference in the cross-validated misclassification rates. if `rank == TRUE`, users cross-validation method with `ntrain = min((k-1)/kn, d)` sample training sets, where `d` is the number of dimensions in `X`. This ensures that the training data is always low-rank, `ntrain < d + 1`. Note that the resulting training sets may have `ntrain < (k-1)/kn`, but the resulting testing sets will always be properly rotated `ntest = n/k` to ensure no dependencies in fold-wise testing.
`...`	optional args.

Value

sets the cross-validation sets as an object of class "XV" containing the following:

`train`	length `[ntrain]` vector indicating the indices of the training examples.
`test`	length `[ntest]` vector indicating the indices of the testing examples.

Author(s)

Eric Bridgeford

Examples

# prepare data for 10-fold validation
library(lolR)
data <- lol.sims.rtrunk(n=200, d=30)  # 200 examples of 30 dimensions
X <- data$X; Y <- data$Y
sets.xval.10fold <- lol.xval.split(X, Y, k=10)

# prepare data for loo validation
sets.xval.loo <- lol.xval.split(X, Y, k='loo')

[Package lolR version 2.1 Index]