| rSplit {Qindex} | R Documentation |
Random Split Sampling with Stratification
Description
Random split sampling, stratified based on the type of the response.
Usage
rSplit(y, nsplit, stratified = TRUE, trainFrac = 0.8, ...)
Arguments
y |
a double vector,
a logical vector,
a factor,
or a Surv object,
response |
nsplit |
positive integer scalar, replicates of random splits to be performed |
stratified |
logical scalar,
whether stratification based on response |
trainFrac |
double scalar between 0 and 1,
fraction of the training set, default |
... |
additional parameters, currently not in use |
Details
Function rSplit() performs random split sampling,
with or without stratification. Specifically,
If
stratified = FALSE, or if we have a double responsey, then split the sample into a training and a test set by ratiotrainFrac, without stratification.Otherwise, split a Surv response
y, stratified by its censoring status. Specifically, split subjects with observed event into a training and a test set with training set fractiontrainFrac, and split the censored subjects into a training and a test set with training set fractiontrainFrac. Then combine the training sets from subjects with observed events and censored subjects, and combine the test sets from subjects with observed events and censored subjects.Otherwise, split a logical response
y, stratified by itself. Specifically, split the subjects withTRUEresponse into a training and a test set with training set fractiontrainFrac, and split the subjects withFALSEresponse into a training and a test set with training set fractiontrainFrac. Then combine the training sets, and the test sets, in a similar fashion as described above.Otherwise, split a factor response
y, stratified by its levels. Specifically, split the subjects in each level ofyinto a training and a test set by ratiotrainFrac. Then combine the training sets, and the test sets, from all levels ofy.
Value
Function rSplit() returns a length-nsplit list of
logical vectors.
In each logical vector,
the TRUE elements indicate training subjects and
the FALSE elements indicate test subjects.
Note
caTools::sample.split() is not what we need.
See Also
Examples
rSplit(y = rep(c(TRUE, FALSE), times = c(20, 30)), nsplit = 3L)