R: Random Split Sampling with Stratification

rSplit {Qindex}

R Documentation

Random Split Sampling with Stratification

Description

Random split sampling, stratified based on the type of the response.

Usage

rSplit(y, nsplit, stratified = TRUE, trainFrac = 0.8, ...)

Arguments

`y`	a double vector, a logical vector, a factor, or a Surv object, response `y`
`nsplit`	positive integer scalar, replicates of random splits to be performed
`stratified`	logical scalar, whether stratification based on response `y` needs to be implemented, default `TRUE`
`trainFrac`	double scalar between 0 and 1, fraction of the training set, default `.8`
`...`	additional parameters, currently not in use

Details

Function rSplit() performs random split sampling, with or without stratification. Specifically,

If stratified = FALSE, or if we have a double response y, then split the sample into a training and a test set by ratio trainFrac, without stratification.
Otherwise, split a Surv response y, stratified by its censoring status. Specifically, split subjects with observed event into a training and a test set with training set fraction trainFrac, and split the censored subjects into a training and a test set with training set fraction trainFrac. Then combine the training sets from subjects with observed events and censored subjects, and combine the test sets from subjects with observed events and censored subjects.
Otherwise, split a logical response y, stratified by itself. Specifically, split the subjects with TRUE response into a training and a test set with training set fraction trainFrac, and split the subjects with FALSE response into a training and a test set with training set fraction trainFrac. Then combine the training sets, and the test sets, in a similar fashion as described above.
Otherwise, split a factor response y, stratified by its levels. Specifically, split the subjects in each level of y into a training and a test set by ratio trainFrac. Then combine the training sets, and the test sets, from all levels of y.

Value

Function rSplit() returns a length-nsplit list of logical vectors. In each logical vector, the TRUE elements indicate training subjects and the FALSE elements indicate test subjects.

Note

caTools::sample.split() is not what we need.

Examples

rSplit(y = rep(c(TRUE, FALSE), times = c(20, 30)), nsplit = 3L)

[Package Qindex version 0.1.5 Index]