lol.xval.split {lolR}R Documentation

Cross-Validation Data Splitter

Description

A function to split a dataset into training and testing sets for cross validation. The procedure for cross-validation is to split the data into k-folds. The k-folds are then rotated individually to form a single held-out testing set the model will be validated on, and the remaining (k-1) folds are used for training the developed model. Note that this cross-validation function includes functionality to be used for low-rank cross-validation. In that case, instead of using the full (k-1) folds for training, we subset min((k-1)/k*n, d) samples to ensure that the resulting training sets are all low-rank. We still rotate properly over the held-out fold to ensure that the resulting testing sets do not have any shared examples, which would add a complicated dependence structure to inference we attempt to infer on the testing sets.

Usage

lol.xval.split(X, Y, k = "loo", rank.low = FALSE, ...)

Arguments

X

[n, d] the data with n samples in d dimensions.

Y

[n] the labels of the samples with K unique labels.

k

the cross-validated method to perform. Defaults to 'loo'.

  • if k == round(k), performed k-fold cross-validation.

  • if k == 'loo', performs leave-one-out cross-validation.

rank.low

whether to force the training set to low-rank. Defaults to FALSE.

  • if rank == FALSE, uses default cross-validation method with standard k-fold validation. Training sets are k-1 folds, and testing sets are 1 fold, where the fold held-out for testing is rotated to ensure no dependence of potential downstream inference in the cross-validated misclassification rates.

  • if rank == TRUE, users cross-validation method with ntrain = min((k-1)/k*n, d) sample training sets, where d is the number of dimensions in X. This ensures that the training data is always low-rank, ntrain < d + 1. Note that the resulting training sets may have ntrain < (k-1)/k*n, but the resulting testing sets will always be properly rotated ntest = n/k to ensure no dependencies in fold-wise testing.

...

optional args.

Value

sets the cross-validation sets as an object of class "XV" containing the following:

train

length [ntrain] vector indicating the indices of the training examples.

test

length [ntest] vector indicating the indices of the testing examples.

Author(s)

Eric Bridgeford

Examples

# prepare data for 10-fold validation
library(lolR)
data <- lol.sims.rtrunk(n=200, d=30)  # 200 examples of 30 dimensions
X <- data$X; Y <- data$Y
sets.xval.10fold <- lol.xval.split(X, Y, k=10)

# prepare data for loo validation
sets.xval.loo <- lol.xval.split(X, Y, k='loo')


[Package lolR version 2.1 Index]